Skip to content

Conversation

@adamdougal
Copy link

@adamdougal adamdougal commented Feb 10, 2026

  • Add BaseRealtimeClient abstract base and RealtimeClientProtocol with connect, disconnect, send_audio, send_text, send_tool_result, update_session, and events methods
  • Add AzureOpenAIRealtimeClient using the OpenAI SDK beta realtime API for Azure OpenAI
  • Add OpenAIRealtimeClient using the OpenAI SDK GA realtime API
  • Add AzureVoiceLiveClient using the Azure Voice Live SDK (azure-ai-voicelive)
  • Add RealtimeAgent for high-level voice agent orchestration with tool execution
  • Add RealtimeSessionConfig, RealtimeEvent, and related dataclass types
  • Add update_session() for changing session config (instructions, tools, voice) without reconnecting
  • Add public tool_to_schema() and execute_tool() helper functions
  • Add multi-agent sample demonstrating agent transfers over a single connection
  • Add microphone, tools, FastAPI WebSocket, and WebSocket client samples
  • Add agent-framework-azure-voice-live package
  • Add comprehensive test coverage for all three clients and the realtime agent

Fixes #728

Motivation and Context

Agent Framework currently supports text-based chat clients but has no support for realtime bidirectional voice streaming. Several LLM providers now offer WebSocket-based realtime APIs (OpenAI Realtime, Azure OpenAI Realtime, Azure Voice Live) that enable natural voice conversations with function calling, VAD, and barge-in support. This PR adds first-class support for these APIs so developers can build voice agents using the same patterns they already use for chat agents.

Description

Protocol & Base (packages/core)

RealtimeClientProtocol defines the interface all realtime clients must implement. BaseRealtimeClient provides shared logic including session config translation, a convenience as_agent() factory, and serialization support. RealtimeSessionConfig and RealtimeEvent are simple dataclasses that normalize configuration and events across providers.

Client Implementations

  • OpenAIRealtimeClient — connects via openai.OpenAI().realtime (GA API). Translates framework events to/from the OpenAI event format. Reads credentials from OPENAI_API_KEY.
  • AzureOpenAIRealtimeClient — connects via openai.AzureOpenAI().realtime (beta API). Supports both API key and azure-identity credential auth. Reads settings from AzureOpenAISettings.
  • AzureVoiceLiveClient — connects via the azure-ai-voicelive SDK using typed model objects (UserMessageItem, InputTextContentPart, etc.). Packaged as the separate agent-framework-azure-voice-live package since it depends on the Voice Live SDK.

All three clients implement update_session() which allows changing instructions, tools, and (where supported) voice on an active connection without dropping the conversation. OpenAI and Azure OpenAI reject voice changes once assistant audio exists; AzureVoiceLiveClient supports voice changes at any time — this limitation is documented in the method docstrings.

RealtimeAgent

RealtimeAgent wraps a BaseRealtimeClient and adds automatic tool execution. Given an audio stream, it connects the client, forwards audio, dispatches tool calls, sends results, and yields normalized RealtimeEvent objects. Public tool_to_schema() and execute_tool() functions are exported for use in custom orchestration (e.g., the multi-agent sample).

Samples (samples/getting_started/realtime/)

Sample What it demonstrates
realtime_with_microphone.py Basic voice conversation with mic/speaker
realtime_with_tools.py Voice conversation with @tool function calling
realtime_with_multiple_agents.py Multiple agents (greeter, support, assistant) transferring via a single connection using update_session()
realtime_fastapi_websocket.py WebSocket API server for browser clients
websocket_audio_client.py CLI client for the FastAPI endpoint
audio_utils.py Shared mic capture and speaker playback utilities

All samples support --client-type (or REALTIME_CLIENT_TYPE env var) to switch between openai, azure_openai, and azure_voice_live.

Tests

  • Unit tests for all three client implementations with mocked provider SDKs
  • Tests for RealtimeAgent tool dispatch, event forwarding, and error handling
  • Tests for RealtimeSessionConfig and RealtimeEvent types
  • Tests for AzureVoiceLiveSettings configuration
  • Tests for Azure OpenAI realtime settings integration

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? No — all additions are new APIs.

Copilot AI review requested due to automatic review settings February 10, 2026 19:55
@markwallace-microsoft markwallace-microsoft added documentation Improvements or additions to documentation python labels Feb 10, 2026
@adamdougal adamdougal changed the title feat(realtime): add realtime voice agents with OpenAI, Azure OpenAI and Voice Live clients Python: add realtime voice agents with OpenAI, Azure OpenAI and Voice Live clients Feb 10, 2026
@adamdougal adamdougal changed the title Python: add realtime voice agents with OpenAI, Azure OpenAI and Voice Live clients Python: Add realtime voice agents with OpenAI, Azure OpenAI and Voice Live clients Feb 10, 2026
@github-actions github-actions bot changed the title Python: Add realtime voice agents with OpenAI, Azure OpenAI and Voice Live clients Python: feat(realtime): add realtime voice agents with OpenAI, Azure OpenAI and Voice Live clients Feb 10, 2026
@adamdougal adamdougal changed the title Python: feat(realtime): add realtime voice agents with OpenAI, Azure OpenAI and Voice Live clients Python: Add realtime voice agents with OpenAI, Azure OpenAI and Voice Live clients Feb 10, 2026
@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Feb 10, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/core/agent_framework
   _realtime_agent.py101694%83–84, 170, 176–177, 220
   _realtime_client.py56198%118
   _realtime_types.py160100% 
packages/core/agent_framework/azure
   _realtime_client.py1217042%154–155, 168, 226–232, 240–241, 253–254, 262, 270–271, 279, 287–288, 290–293, 304, 306–312, 319–320, 327–338, 347–348, 355–360, 367–373, 377–381, 385–386, 388–389
   _shared.py81396%144–145, 236
packages/core/agent_framework/openai
   _realtime_client.py1245456%91–92, 251–252, 254–257, 268, 270–276, 283–284, 291–302, 311–312, 319–324, 331–337, 341–345, 349–350, 352–353
TOTAL17206223587% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
4136 225 💤 0 ❌ 0 🔥 1m 12s ⏱️

@adamdougal
Copy link
Author

Hello!

Appreciate this is a large PR so not expecting a detailed review quickly. However, I'd love an indiciation of how likely this is to be merged and if so what the timelines are.

I've based the implementation on the draft ADR here.

Also, it looks like the samples check is failing, is it not possible to add samples in this way until the framework code is merged?

Thanks

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class realtime bidirectional voice support to the Python Agent Framework, including normalized realtime types, a base/protocol abstraction, three provider clients (OpenAI, Azure OpenAI, Azure Voice Live), a high-level RealtimeAgent, plus samples and tests.

Changes:

  • Introduces realtime core abstractions (RealtimeClientProtocol, BaseRealtimeClient, RealtimeEvent, RealtimeSessionConfig) and RealtimeAgent orchestration.
  • Adds provider implementations: OpenAIRealtimeClient, AzureOpenAIRealtimeClient, and new package agent-framework-azure-voice-live (Voice Live SDK).
  • Adds realtime getting-started samples (mic, tools, multi-agent transfer, FastAPI websocket bridge, websocket client) and accompanying tests.

Reviewed changes

Copilot reviewed 32 out of 35 changed files in this pull request and generated 20 comments.

Show a summary per file
File Description
python/uv.lock Adds workspace package + Voice Live SDK dependency lock entries.
python/pyproject.toml Registers agent-framework-azure-voice-live as a workspace member dependency.
python/samples/getting_started/realtime/init.py Marks realtime samples folder as a package.
python/samples/getting_started/realtime/README.md Documentation for running realtime voice samples and configuration.
python/samples/getting_started/realtime/audio_utils.py Shared mic capture / speaker playback utilities for samples.
python/samples/getting_started/realtime/realtime_fastapi_websocket.py FastAPI WebSocket bridge sample for browser/clients.
python/samples/getting_started/realtime/realtime_with_microphone.py Basic mic-to-realtime-agent sample.
python/samples/getting_started/realtime/realtime_with_multiple_agents.py Demonstrates agent transfer via update_session() on one connection.
python/samples/getting_started/realtime/realtime_with_tools.py Demonstrates realtime tool-calling during voice conversation.
python/samples/getting_started/realtime/websocket_audio_client.py CLI websocket client sample that streams mic audio and plays responses.
python/packages/core/agent_framework/init.py Exports realtime APIs from the top-level agent_framework package.
python/packages/core/agent_framework/_realtime_agent.py Implements RealtimeAgent, plus tool_to_schema() / execute_tool() helpers.
python/packages/core/agent_framework/_realtime_client.py Defines RealtimeClientProtocol and BaseRealtimeClient.
python/packages/core/agent_framework/_realtime_types.py Adds normalized dataclasses for session config and events.
python/packages/core/agent_framework/openai/init.py Exposes OpenAIRealtimeClient from the OpenAI module.
python/packages/core/agent_framework/openai/_realtime_client.py OpenAI GA realtime implementation + event normalization.
python/packages/core/agent_framework/azure/init.py Adds lazy exports for Azure realtime + Voice Live package types.
python/packages/core/agent_framework/azure/_realtime_client.py Azure OpenAI realtime implementation + event normalization.
python/packages/core/agent_framework/azure/_shared.py Adds realtime_deployment_name to AzureOpenAISettings.
python/packages/core/tests/core/test_realtime_agent.py Validates tool execution, event forwarding, and transcript/thread behavior.
python/packages/core/tests/core/test_realtime_client.py Protocol/base tests and config translation tests.
python/packages/core/tests/core/test_realtime_types.py Tests for realtime dataclasses and exports.
python/packages/core/tests/openai/test_openai_realtime_client.py Tests OpenAI realtime client behavior with mocked SDK.
python/packages/core/tests/azure/test_azure_realtime_client.py Tests Azure OpenAI realtime client behavior with mocked SDK.
python/packages/core/tests/azure/test_azure_openai_settings_realtime.py Tests new realtime_deployment_name setting integration.
python/packages/azure-ai-voice-live/pyproject.toml New distributable package for Voice Live integration.
python/packages/azure-ai-voice-live/README.md Usage docs for the Voice Live integration package.
python/packages/azure-ai-voice-live/LICENSE Package license.
python/packages/azure-ai-voice-live/agent_framework_azure_voice_live/init.py Package exports and version discovery.
python/packages/azure-ai-voice-live/agent_framework_azure_voice_live/_client.py Voice Live realtime client implementation.
python/packages/azure-ai-voice-live/agent_framework_azure_voice_live/_settings.py Settings model for Voice Live client.
python/packages/azure-ai-voice-live/agent_framework_azure_voice_live/py.typed Marks package as typed.
python/packages/azure-ai-voice-live/tests/test_client.py Unit tests for Voice Live client normalization + connect flows.
python/packages/azure-ai-voice-live/tests/test_settings.py Unit tests for Voice Live settings env/constructor behavior.
python/packages/azure-ai-voice-live/tests/init.py Test package marker.

…and Voice Live clients

- Add BaseRealtimeClient protocol with connect, disconnect, send_audio, send_text,
  send_tool_result, update_session, and events methods
- Add AzureOpenAIRealtimeClient using the OpenAI SDK beta realtime API
- Add OpenAIRealtimeClient using the OpenAI SDK GA realtime API
- Add AzureVoiceLiveClient using the Azure Voice Live SDK
- Add RealtimeAgent for high-level voice agent orchestration
- Add RealtimeSessionConfig, RealtimeEvent, and related types
- Add update_session() for changing session config without reconnecting
- Add public tool_to_schema() and execute_tool() helper functions
- Add multi-agent sample demonstrating agent transfers via single connection
- Add single-agent and bidirectional audio samples
- Add comprehensive test coverage for all clients and agent
- Set _connected = True after successful connect()
- Set _connected = False at the start of disconnect()
- Applied to both OpenAIRealtimeClient and AzureOpenAIRealtimeClient
- AzureVoiceLiveClient already tracked this correctly
- Clear _pending_function_names in disconnect() for both OpenAI and
  Azure OpenAI clients to avoid leaking per-connection state across
  reconnects
- Two remaining tools (get_weather, get_time) are sufficient
- Removes eval()-based code that was a copy/paste risk
- Collect all messages in a single list instead of grouping by role
- Store messages in transcript order (user, assistant, user, assistant)
- Update test assertions to match chronological ordering
- Removes eval()-based calculate tool (same as realtime_with_tools)
- Two remaining tools (get_weather, get_time) are sufficient
…owth

- Set maxsize=100 on asyncio.Queue to apply backpressure
- Prevents memory growth if client sends audio faster than consumed
- Thread stored _api_version through to vl_connect()
- The SDK supports it; previously the setting was accepted but ignored
- Update existing connect tests to assert default api_version
- Add test for custom api_version forwarding
- Add input_transcript, response_transcript, and tool_result
- Reflects the full normalized event surface produced by clients
- Non-FunctionTool entries now return an explicit error message
- Previously fell back to str(tool), sending misleading results to model
- Add test covering non-FunctionTool execution path
- Rename to tool_name in tool_call branch to avoid shadowing builtin
- Remove unused name assignment in tool_result branch
- Use single 'from contextlib import' instead of mixing import styles
- Replace contextlib.suppress with suppress throughout
- Add logger to _realtime_agent.py using project get_logger convention
- Log cancellation at debug level instead of silent pass
- Add logging to realtime_fastapi_websocket, realtime_with_multiple_agents,
  and websocket_audio_client samples
- Log cancellation/disconnect at debug level instead of silent pass
- Declare external dependencies (pyaudio, fastapi, uvicorn, websockets)
- Makes samples self-contained and runnable with uv run
- Follows SAMPLE_GUIDELINES.md conventions
- Use get("id") instead of ["id"] to avoid KeyError
- Log error and emit error event when id is missing
- Skip send_tool_result when no id is available
…atMessage → Message)

- Replace ToolProtocol with FunctionTool in _realtime_agent.py and _realtime_client.py
- Replace ChatMessage with Message in _realtime_agent.py
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 32 out of 35 changed files in this pull request and generated 3 comments.

adamdougal and others added 3 commits February 11, 2026 11:13
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Use indented single-line format consistent with realtime_with_multiple_agents
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Single Agent: RealtimeAgent and client implementations for integration with realtime APIs

2 participants