Azure Integration Services Blog

3 MIN READ

Introducing Model Logging, Import from AI Foundry, and extended model support in AI Gateway

Microsoft

May 19, 2025

As organizations increasingly integrate AI into their applications, managing model usage, ensuring governance, and optimizing performance across diverse APIs has become critical. Azure API Management’s AI Gateway is evolving rapidly to meet these needs introducing powerful new capabilities that simplify integration, improve observability, and enhance control over AI workloads. In this update, we’re excited to share several key enhancements, including expanded support for Responses API and AWS Bedrock APIs, advanced token tracking and logging, session-aware load balancing, and streamlined onboarding for custom models. Let’s dive into what’s new and how you can take advantage of these features today.

Model Logging and Token tracking dashboard

Understanding how your AI models are being used is critical for governance, cost management, and performance tuning. AI Gateway now enables comprehensive model logging and token tracking, giving you visibility into:

Prompts and completions

Token usage

You can configure diagnostic settings to export this data to long-term storage solutions such as Azure Monitor, Azure Storage, or Event Hub for custom analysis.

Importantly, this logging feature is fully compatible with streaming responses, allowing you to capture detailed insights without compromising the real-time experience for users.

A built-in dashboard in the Azure portal provides an at-a-glance view of token usage trends, model performance across teams, and cost drivers- empowering organizations to make data-driven decisions around AI consumption and policy.

Learn more about model logging.

Responses API Support (Preview)

The Responses API is a new stateful API from Azure OpenAI that unifies the capabilities of the Chat Completions API and the Assistants API into a single, streamlined interface. This makes it easier to build multi-turn conversational experiences, maintain session context, and handle tool calling: all within one API.

With AI Gateway support for the Responses API, you now get:

Token limiting to manage usage quotas

Token and request tracking for auditing and monitoring

Semantic caching to reduce latency and optimize compute

Content filtering and safety controls

This support enables organizations to confidently use the Responses API at scale with built-in observability and governance.

AWS Bedrock API Support

In our continued effort to support multi-cloud AI strategies, we’re thrilled to announce native support for AWS Bedrock API in AI Gateway.

This means you can now:

Apply token limits to Bedrock-based models

Use semantic caching to minimize redundant requests

Enforce content safety and responsible AI policies

Log prompts and completions just as you would with Azure-hosted models

Whether you’re running models like Anthropic Claude or Bedrock, you can bring them under the same centralized AI Gateway streamlining operations, compliance, and user experience.

Simplified Onboarding: AI Foundry and OpenAI-Compatible APIs

With the introduction of LLM policies that now support Azure AI Model Inference and 3rd party OpenAI-compatible APIs we wanted to simplify the process of onboarding those APIs to Azure API Management.

We’re happy to announce two new experiences in Azure API Management’s portal: Import from Azure AI Foundry and Create OpenAI API. These new gestures allow you to easily configure your model endpoints to be exposed via AI Gateway and configure token limiting, token tracking, semantic caching and content safety policy directly from the Azure portal.

Session-aware load balancing

Modern LLM applications, especially chatbots, agents, and batch inference workloads—often require stateful processing, where a user’s requests must consistently hit the same backend to preserve context.

We’re introducing session-aware load balancing in Azure API Management to meet this need.

With this feature, you can:

Enable cookie-based session affinity for load-balanced backends

Ensure that requests from the same session are routed to the same Azure OpenAI or third-party endpoint

Support APIs like Assistants or the new Responses API that rely on consistent backend state

Session-aware load balancing ensures your multi-turn conversations or batched tool-calling experiences remain consistent, reliable, and scalable while still benefiting from Azure API Management’s AI Gateway capabilities.

Learn more about session-aware load balancing.

Get started

These new capabilities are being gradually rolled out across all Azure regions where API Management is available.

Want early access to the latest AI Gateway features? You can now configure your Azure API Management instance to join the AI Gateway Early (GenAI Release) update group. This gives you access to new features before they are made generally available to all customers.

To configure this, navigate to the Service Update Settings blade in the Azure portal and select the appropriate update track.

Learn more about update groups.

Published May 19, 2025

Version 1.0

Microsoft

Joined April 05, 2023

View Profile

Azure Integration Services Blog