-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Python: Add realtime voice agents with OpenAI, Azure OpenAI and Voice Live clients #3821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Python Test Coverage Report •
Python Unit Test Overview
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello! Appreciate this is a large PR so not expecting a detailed review quickly. However, I'd love an indiciation of how likely this is to be merged and if so what the timelines are. I've based the implementation on the draft ADR here. Also, it looks like the samples check is failing, is it not possible to add samples in this way until the framework code is merged? Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds first-class realtime bidirectional voice support to the Python Agent Framework, including normalized realtime types, a base/protocol abstraction, three provider clients (OpenAI, Azure OpenAI, Azure Voice Live), a high-level RealtimeAgent, plus samples and tests.
Changes:
- Introduces realtime core abstractions (
RealtimeClientProtocol,BaseRealtimeClient,RealtimeEvent,RealtimeSessionConfig) andRealtimeAgentorchestration. - Adds provider implementations:
OpenAIRealtimeClient,AzureOpenAIRealtimeClient, and new packageagent-framework-azure-voice-live(Voice Live SDK). - Adds realtime getting-started samples (mic, tools, multi-agent transfer, FastAPI websocket bridge, websocket client) and accompanying tests.
Reviewed changes
Copilot reviewed 32 out of 35 changed files in this pull request and generated 20 comments.
Show a summary per file
| File | Description |
|---|---|
| python/uv.lock | Adds workspace package + Voice Live SDK dependency lock entries. |
| python/pyproject.toml | Registers agent-framework-azure-voice-live as a workspace member dependency. |
| python/samples/getting_started/realtime/init.py | Marks realtime samples folder as a package. |
| python/samples/getting_started/realtime/README.md | Documentation for running realtime voice samples and configuration. |
| python/samples/getting_started/realtime/audio_utils.py | Shared mic capture / speaker playback utilities for samples. |
| python/samples/getting_started/realtime/realtime_fastapi_websocket.py | FastAPI WebSocket bridge sample for browser/clients. |
| python/samples/getting_started/realtime/realtime_with_microphone.py | Basic mic-to-realtime-agent sample. |
| python/samples/getting_started/realtime/realtime_with_multiple_agents.py | Demonstrates agent transfer via update_session() on one connection. |
| python/samples/getting_started/realtime/realtime_with_tools.py | Demonstrates realtime tool-calling during voice conversation. |
| python/samples/getting_started/realtime/websocket_audio_client.py | CLI websocket client sample that streams mic audio and plays responses. |
| python/packages/core/agent_framework/init.py | Exports realtime APIs from the top-level agent_framework package. |
| python/packages/core/agent_framework/_realtime_agent.py | Implements RealtimeAgent, plus tool_to_schema() / execute_tool() helpers. |
| python/packages/core/agent_framework/_realtime_client.py | Defines RealtimeClientProtocol and BaseRealtimeClient. |
| python/packages/core/agent_framework/_realtime_types.py | Adds normalized dataclasses for session config and events. |
| python/packages/core/agent_framework/openai/init.py | Exposes OpenAIRealtimeClient from the OpenAI module. |
| python/packages/core/agent_framework/openai/_realtime_client.py | OpenAI GA realtime implementation + event normalization. |
| python/packages/core/agent_framework/azure/init.py | Adds lazy exports for Azure realtime + Voice Live package types. |
| python/packages/core/agent_framework/azure/_realtime_client.py | Azure OpenAI realtime implementation + event normalization. |
| python/packages/core/agent_framework/azure/_shared.py | Adds realtime_deployment_name to AzureOpenAISettings. |
| python/packages/core/tests/core/test_realtime_agent.py | Validates tool execution, event forwarding, and transcript/thread behavior. |
| python/packages/core/tests/core/test_realtime_client.py | Protocol/base tests and config translation tests. |
| python/packages/core/tests/core/test_realtime_types.py | Tests for realtime dataclasses and exports. |
| python/packages/core/tests/openai/test_openai_realtime_client.py | Tests OpenAI realtime client behavior with mocked SDK. |
| python/packages/core/tests/azure/test_azure_realtime_client.py | Tests Azure OpenAI realtime client behavior with mocked SDK. |
| python/packages/core/tests/azure/test_azure_openai_settings_realtime.py | Tests new realtime_deployment_name setting integration. |
| python/packages/azure-ai-voice-live/pyproject.toml | New distributable package for Voice Live integration. |
| python/packages/azure-ai-voice-live/README.md | Usage docs for the Voice Live integration package. |
| python/packages/azure-ai-voice-live/LICENSE | Package license. |
| python/packages/azure-ai-voice-live/agent_framework_azure_voice_live/init.py | Package exports and version discovery. |
| python/packages/azure-ai-voice-live/agent_framework_azure_voice_live/_client.py | Voice Live realtime client implementation. |
| python/packages/azure-ai-voice-live/agent_framework_azure_voice_live/_settings.py | Settings model for Voice Live client. |
| python/packages/azure-ai-voice-live/agent_framework_azure_voice_live/py.typed | Marks package as typed. |
| python/packages/azure-ai-voice-live/tests/test_client.py | Unit tests for Voice Live client normalization + connect flows. |
| python/packages/azure-ai-voice-live/tests/test_settings.py | Unit tests for Voice Live settings env/constructor behavior. |
| python/packages/azure-ai-voice-live/tests/init.py | Test package marker. |
python/samples/getting_started/realtime/realtime_with_multiple_agents.py
Show resolved
Hide resolved
…and Voice Live clients - Add BaseRealtimeClient protocol with connect, disconnect, send_audio, send_text, send_tool_result, update_session, and events methods - Add AzureOpenAIRealtimeClient using the OpenAI SDK beta realtime API - Add OpenAIRealtimeClient using the OpenAI SDK GA realtime API - Add AzureVoiceLiveClient using the Azure Voice Live SDK - Add RealtimeAgent for high-level voice agent orchestration - Add RealtimeSessionConfig, RealtimeEvent, and related types - Add update_session() for changing session config without reconnecting - Add public tool_to_schema() and execute_tool() helper functions - Add multi-agent sample demonstrating agent transfers via single connection - Add single-agent and bidirectional audio samples - Add comprehensive test coverage for all clients and agent
- Set _connected = True after successful connect() - Set _connected = False at the start of disconnect() - Applied to both OpenAIRealtimeClient and AzureOpenAIRealtimeClient - AzureVoiceLiveClient already tracked this correctly
- Clear _pending_function_names in disconnect() for both OpenAI and Azure OpenAI clients to avoid leaking per-connection state across reconnects
- Two remaining tools (get_weather, get_time) are sufficient - Removes eval()-based code that was a copy/paste risk
- Collect all messages in a single list instead of grouping by role - Store messages in transcript order (user, assistant, user, assistant) - Update test assertions to match chronological ordering
- Removes eval()-based calculate tool (same as realtime_with_tools) - Two remaining tools (get_weather, get_time) are sufficient
…owth - Set maxsize=100 on asyncio.Queue to apply backpressure - Prevents memory growth if client sends audio faster than consumed
- Thread stored _api_version through to vl_connect() - The SDK supports it; previously the setting was accepted but ignored - Update existing connect tests to assert default api_version - Add test for custom api_version forwarding
- Add input_transcript, response_transcript, and tool_result - Reflects the full normalized event surface produced by clients
- Non-FunctionTool entries now return an explicit error message - Previously fell back to str(tool), sending misleading results to model - Add test covering non-FunctionTool execution path
- Rename to tool_name in tool_call branch to avoid shadowing builtin - Remove unused name assignment in tool_result branch
- Use single 'from contextlib import' instead of mixing import styles - Replace contextlib.suppress with suppress throughout
- Add logger to _realtime_agent.py using project get_logger convention - Log cancellation at debug level instead of silent pass
- Add logging to realtime_fastapi_websocket, realtime_with_multiple_agents, and websocket_audio_client samples - Log cancellation/disconnect at debug level instead of silent pass
- Declare external dependencies (pyaudio, fastapi, uvicorn, websockets) - Makes samples self-contained and runnable with uv run - Follows SAMPLE_GUIDELINES.md conventions
- Use get("id") instead of ["id"] to avoid KeyError
- Log error and emit error event when id is missing
- Skip send_tool_result when no id is available
…atMessage → Message) - Replace ToolProtocol with FunctionTool in _realtime_agent.py and _realtime_client.py - Replace ChatMessage with Message in _realtime_agent.py
f5cafd1 to
c6700a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 32 out of 35 changed files in this pull request and generated 3 comments.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Use indented single-line format consistent with realtime_with_multiple_agents
BaseRealtimeClientabstract base andRealtimeClientProtocolwithconnect,disconnect,send_audio,send_text,send_tool_result,update_session, andeventsmethodsAzureOpenAIRealtimeClientusing the OpenAI SDK beta realtime API for Azure OpenAIOpenAIRealtimeClientusing the OpenAI SDK GA realtime APIAzureVoiceLiveClientusing the Azure Voice Live SDK (azure-ai-voicelive)RealtimeAgentfor high-level voice agent orchestration with tool executionRealtimeSessionConfig,RealtimeEvent, and related dataclass typesupdate_session()for changing session config (instructions, tools, voice) without reconnectingtool_to_schema()andexecute_tool()helper functionsagent-framework-azure-voice-livepackageFixes #728
Motivation and Context
Agent Framework currently supports text-based chat clients but has no support for realtime bidirectional voice streaming. Several LLM providers now offer WebSocket-based realtime APIs (OpenAI Realtime, Azure OpenAI Realtime, Azure Voice Live) that enable natural voice conversations with function calling, VAD, and barge-in support. This PR adds first-class support for these APIs so developers can build voice agents using the same patterns they already use for chat agents.
Description
Protocol & Base (
packages/core)RealtimeClientProtocoldefines the interface all realtime clients must implement.BaseRealtimeClientprovides shared logic including session config translation, a convenienceas_agent()factory, and serialization support.RealtimeSessionConfigandRealtimeEventare simple dataclasses that normalize configuration and events across providers.Client Implementations
OpenAIRealtimeClient— connects viaopenai.OpenAI().realtime(GA API). Translates framework events to/from the OpenAI event format. Reads credentials fromOPENAI_API_KEY.AzureOpenAIRealtimeClient— connects viaopenai.AzureOpenAI().realtime(beta API). Supports both API key andazure-identitycredential auth. Reads settings fromAzureOpenAISettings.AzureVoiceLiveClient— connects via theazure-ai-voiceliveSDK using typed model objects (UserMessageItem,InputTextContentPart, etc.). Packaged as the separateagent-framework-azure-voice-livepackage since it depends on the Voice Live SDK.All three clients implement
update_session()which allows changing instructions, tools, and (where supported) voice on an active connection without dropping the conversation. OpenAI and Azure OpenAI reject voice changes once assistant audio exists;AzureVoiceLiveClientsupports voice changes at any time — this limitation is documented in the method docstrings.RealtimeAgent
RealtimeAgentwraps aBaseRealtimeClientand adds automatic tool execution. Given an audio stream, it connects the client, forwards audio, dispatches tool calls, sends results, and yields normalizedRealtimeEventobjects. Publictool_to_schema()andexecute_tool()functions are exported for use in custom orchestration (e.g., the multi-agent sample).Samples (
samples/getting_started/realtime/)realtime_with_microphone.pyrealtime_with_tools.py@toolfunction callingrealtime_with_multiple_agents.pyupdate_session()realtime_fastapi_websocket.pywebsocket_audio_client.pyaudio_utils.pyAll samples support
--client-type(orREALTIME_CLIENT_TYPEenv var) to switch betweenopenai,azure_openai, andazure_voice_live.Tests
RealtimeAgenttool dispatch, event forwarding, and error handlingRealtimeSessionConfigandRealtimeEventtypesAzureVoiceLiveSettingsconfigurationContribution Checklist