feat: add event emission support for OpenAI Agents instrumentation#3645
feat: add event emission support for OpenAI Agents instrumentation#3645LingduoKong wants to merge 7 commits intotraceloop:mainfrom
Conversation
…trumentation Add support for emitting OpenTelemetry events following GenAI semantic conventions. This includes: - New event models (MessageEvent, ChoiceEvent, ToolStartEvent, ToolEndEvent) - Event emitter implementation with proper semantic convention compliance - Config support for event_logger and use_legacy_attributes flag - Integration with hooks to emit events for messages, choices, and tool calls - Respects TRACELOOP_TRACE_CONTENT setting for content redaction The implementation maintains backward compatibility through the use_legacy_attributes flag (default: True), which uses span attributes when enabled and events when disabled. Closes traceloop#3441
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughAdds event-emission support to the OpenAI Agents instrumentation: new Config class and instrumentor options, event models and emitter, hooks updated to emit GenAI semantic-convention events (or fall back to legacy span attributes), utilities and tests plus VCR cassettes for event scenarios. Changes
Sequence DiagramsequenceDiagram
participant App as Application
participant Instr as OpenAI Agents Instrumentor
participant Hook as Hooks (_hooks.py)
participant Emitter as Event Emitter
participant Logger as EventLogger
App->>Instr: Initialize instrumentor(use_legacy_attributes=False)
Instr->>Instr: Set Config.use_legacy_attributes=False\nConfig.event_logger via EventLoggerProvider
App->>Hook: Agent processes input
Hook->>Hook: should_emit_events()?
alt Event mode
Hook->>Emitter: emit_event(MessageEvent)
Emitter->>Emitter: format per GenAI semantic conventions
Emitter->>Logger: emit(gen_ai.user.message)
else Legacy mode
Hook->>Hook: attach prompt as span attributes
end
App->>Hook: Agent receives model response
Hook->>Hook: should_emit_events()?
alt Event mode
Hook->>Emitter: emit_event(ChoiceEvent)
Emitter->>Logger: emit(gen_ai.choice)
else Legacy mode
Hook->>Hook: attach completion attributes to span
end
Note right of Hook: Tool/function spans\n-> emit ToolStart/ToolEnd or set attributes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…tering Bug fixes: - Fix messages with tool_calls but no content being silently skipped (now emits events when role is set AND either content or tool_calls exist) - Fix tool events not being emitted when content tracing is disabled (now emits tool events when events mode is enabled, with empty message when content tracing is disabled)
- Add docstring to Config class explaining the intentional singleton pattern and warning about multiple instrumentor instances - Add comments in _emit_message_event clarifying that unknown roles are kept in the body per semantic conventions (role is required when it differs from the event name)
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed everything up to fb27b90 in 32 seconds. Click for details.
- Reviewed
1824lines of code in12files - Skipped
1files when reviewing. - Skipped posting
0draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
Workflow ID: wflow_ThxQEFUs2B2D4itG
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/config.py`:
- Around line 1-4: The module currently imports the private EventLogger symbol
(EventLogger) from opentelemetry._events; replace that with the public Logs API
by importing the public logger factory (e.g., get_logger) from the OpenTelemetry
logs API and update any uses of EventLogger in this module to obtain and use a
logger via get_logger (or the appropriate public API), ensuring instrumentation
uses only public opentelemetry API surfaces.
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_events/test_agent_with_function_tool_events.yaml`:
- Around line 17-19: The cassette in
tests/cassettes/test_events/test_agent_with_function_tool_events.yaml contains
sensitive/volatile cookies and project identifiers (e.g., cookie entries
"__cf_bm" and "_cfuvid" and OpenAI project IDs) that must be redacted; replace
the real values with deterministic placeholders (e.g., "<REDACTED_COOKIE>" /
"<REDACTED_PROJECT_ID>") or configure the VCR filter to scrub these keys so
replays are deterministic and identifiers are not leaked, and apply the same
redaction to the other occurrences noted (around the other listed line ranges).
In `@packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py`:
- Around line 304-307: The session-scoped fixture span_exporter
(InMemorySpanExporter) can retain spans between tests; update the
function-scoped fixtures instrument_with_content and instrument_with_no_content
to call span_exporter.clear() at the start (or before yielding) so each test
starts with an empty exporter; locate the fixtures instrument_with_content and
instrument_with_no_content and add a span_exporter.clear() invocation (using the
InMemorySpanExporter.clear method) to prevent span leakage across tests.
In `@packages/opentelemetry-instrumentation-openai-agents/tests/test_events.py`:
- Around line 52-55: The test fails because SpanAttributes.LLM_PROMPTS is
referenced but not defined; add a new constant named LLM_PROMPTS to the
SpanAttributes class (opentelemetry.semconv_ai.SpanAttributes) with the
canonical attribute key (e.g., "llm.prompts") and ensure it is
exported/available from the module so tests can access
SpanAttributes.LLM_PROMPTS; keep the name and casing consistent with other
constants in the class and update any module exports or __all__ if needed.
🧹 Nitpick comments (4)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/config.py (1)
22-24: Consider adding type annotation forexception_logger.For consistency with the other class attributes, consider adding a type annotation to
exception_logger.Suggested change
- exception_logger = None + exception_logger: Optional[Any] = None # or a more specific type if knownNote: You'll need to import
Anyfromtypingif the specific type is unknown.packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (2)
57-111: Consider consolidating duplicate tool_call parsing logic.The
_parse_tool_calls_for_eventfunction (lines 57-111) duplicates much of the tool_call normalization logic that also exists in_extract_prompt_attributes(lines 221-261). Both handle:
- Converting objects to dicts via
hasattrchecks- Extracting nested
functionfields- Parsing JSON arguments
This duplication increases maintenance burden and risk of divergence.
♻️ Suggested approach
Extract a shared
_normalize_tool_call(tool_call) -> dicthelper that both the event path and legacy path can use:def _normalize_tool_call(tool_call) -> dict: """Normalize tool_call from various formats to a consistent dict.""" if not isinstance(tool_call, dict): tc_dict = {} if hasattr(tool_call, "id"): tc_dict["id"] = tool_call.id if hasattr(tool_call, "function"): func = tool_call.function tc_dict["name"] = getattr(func, "name", None) tc_dict["arguments"] = getattr(func, "arguments", None) elif hasattr(tool_call, "name"): tc_dict["name"] = tool_call.name if hasattr(tool_call, "arguments"): tc_dict["arguments"] = tool_call.arguments tool_call = tc_dict if "function" in tool_call: function = tool_call["function"] tool_call = { "id": tool_call.get("id"), "name": function.get("name"), "arguments": function.get("arguments"), } return tool_call
746-752: Comment is misleading - content filtering happens here, not in event emitter.The comment on line 747 says "Event emitter handles content filtering internally," but the filtering actually happens right here with
input_str if should_send_prompts() else "". The_emit_tool_start_eventand_emit_tool_end_eventfunctions don't perform any content filtering (they just emit the event as-is).Consider updating the comment for accuracy:
if should_emit_events(): - # Event emitter handles content filtering internally + # Filter content here based on TRACELOOP_TRACE_CONTENT setting emit_event(ToolStartEvent(message=input_str if should_send_prompts() else ""))packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/event_emitter.py (1)
48-49: Redundantshould_emit_events()check.The
emit_eventfunction checksshould_emit_events()at line 48, but all callers in_hooks.pyalready guard withif should_emit_events():before callingemit_event. This double-check is defensive but adds unnecessary overhead on the hot path.Consider removing the check here if you want to reduce overhead, or document it as intentional defensive programming.
| from typing import Optional | ||
|
|
||
| from opentelemetry._events import EventLogger | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
OpenTelemetry Python EventLogger import documentation official API
💡 Result:
As of the current official OpenTelemetry Python API docs, there isn’t an EventLogger class to import. “Events” are emitted via the Logs API (a Logger.emit(..., event_name=...) style API), using opentelemetry._logs. [1]
Official Python Logs API imports (docs)
from opentelemetry._logs import get_loggerget_logger(...) returns a Logger from the global LoggerProvider. [1]
Official Python Logs SDK imports (docs)
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandlerThis is the SDK LoggerProvider / handler used to collect/export log records. [2][3]
Where “EventLogger” is defined (spec, not Python API docs)
EventLogger is defined by the OpenTelemetry Event API specification, but that spec does not imply there is a corresponding Python EventLogger import available today. [4]
Sources: [1] [2] [3] [4]
🌐 Web query:
OpenTelemetry Python _events module private API stability
💡 Result:
opentelemetry._events in OpenTelemetry Python should be treated as private/unstable API:
- In OpenTelemetry Python, Logs are still “Development” and explicitly expected to involve deprecations and breaking changes while being stabilized*—and the Events API is tied to that same “logs/events” area (for example, the logging API docs describe generating “logs and events”). [2][4]
- Per the OpenTelemetry project’s versioning & stability policy, only APIs that have reached the appropriate stability level get compatibility guarantees; unstable/experimental areas do not provide the same stability guarantees. [1]
Practical implication: don’t build a library/public integration that depends on opentelemetry._events remaining stable across minor releases. If you must use it, pin exact versions (and plan for breakage when upgrading) until the corresponding signal/API is declared stable. [1][2]
Sources
[1] OpenTelemetry “Versioning and stability” spec: https://opentelemetry.io/docs/specs/otel/versioning-and-stability/
[2] OpenTelemetry Python repo README (Logs “Development*”, breaking changes while stabilizing): https://github.com/open-telemetry/opentelemetry-python
[4] OpenTelemetry Python docs for opentelemetry._logs (“generate logs and events”): https://opentelemetry-python.readthedocs.io/en/stable/api/_logs.html
🌐 Web query:
OpenTelemetry Python public API vs internal modules underscore convention
💡 Result:
In OpenTelemetry Python, “public API” means the stable, supported import surface intended for application and library code—primarily what’s exposed by the opentelemetry-api package (for example opentelemetry.trace, opentelemetry.metrics, opentelemetry.context). The opentelemetry-sdk package provides the reference implementation and is something applications (not reusable libraries) may choose to depend on. [1]
Practical rules (Python underscore convention applied to OTel)
- Avoid importing anything with a leading underscore (module or symbol), e.g.
opentelemetry.context._somethingorfrom ... import _FOO. In OTel this is explicitly used for internal-only hooks/keys that are not part of the public API and may change without notice. [2] - For reusable libraries/instrumentation, depend on and import from
opentelemetry-apionly (e.g.from opentelemetry import trace/trace.get_tracer(...)). This keeps your library compatible regardless of which SDK (or even no SDK) the application installs. [1] - For applications/agents, it’s normal to import from the SDK (e.g.
opentelemetry.sdk.trace) to configure providers/processors/exporters, but treat SDK internals (especially underscore modules) as non-public. [3]
Quick examples
- Prefer (public API):
from opentelemetry import trace/from opentelemetry.context import attach, detach[1] - App config (SDK):
from opentelemetry.sdk.trace import TracerProvider[3] - Avoid (internal):
from opentelemetry.context import _SUPPRESS_INSTRUMENTATION_KEY(called out as intentionally non-public) [2]
Sources:
[1] OpenTelemetry Python repo README (API vs SDK guidance) (github.com)
[2] opentelemetry-python issue discussing underscore internal key not being public API (github.com)
[3] OpenTelemetry Python SDK trace docs (opentelemetry-python.readthedocs.io)
Citations:
- 1: https://github.com/open-telemetry/opentelemetry-python?utm_source=openai
- 2: Remove SDK's dependency on opentelemetry-instrumentation open-telemetry/opentelemetry-python#2184
- 3: https://opentelemetry-python.readthedocs.io/en/latest/sdk/trace.html?utm_source=openai
Use the public OpenTelemetry Logs API instead of the private opentelemetry._events module.
The EventLogger is imported from opentelemetry._events, which is a private/unstable module. OpenTelemetry's versioning policy explicitly does not provide stability guarantees for internal APIs, and the Events API is currently in development with expected breaking changes. For reusable instrumentation libraries, imports must use only the public API surface (e.g., opentelemetry-api). Replace this with the public Logs API: from opentelemetry._logs import get_logger (or a higher-level public API equivalent) to avoid breaking changes in future releases.
🤖 Prompt for AI Agents
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/config.py`
around lines 1 - 4, The module currently imports the private EventLogger symbol
(EventLogger) from opentelemetry._events; replace that with the public Logs API
by importing the public logger factory (e.g., get_logger) from the OpenTelemetry
logs API and update any uses of EventLogger in this module to obtain and use a
logger via get_logger (or the appropriate public API), ensuring instrumentation
uses only public opentelemetry API surfaces.
...entation-openai-agents/tests/cassettes/test_events/test_agent_with_function_tool_events.yaml
Outdated
Show resolved
Hide resolved
| assert response_span.attributes.get(f"{SpanAttributes.LLM_PROMPTS}.0.role") == "user" | ||
| assert ( | ||
| response_span.attributes.get(f"{SpanAttributes.LLM_PROMPTS}.0.content") == query | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for LLM_PROMPTS definition in the semantic conventions package
echo "=== Searching for LLM_PROMPTS in semantic conventions ==="
rg -n "LLM_PROMPTS" packages/opentelemetry-semantic-conventions-ai/
echo -e "\n=== Searching for LLM_PROMPTS in entire repo ==="
rg -n "LLM_PROMPTS" --type py
echo -e "\n=== Searching for SpanAttributes class definition ==="
rg -n "class SpanAttributes" packages/ -A 20Repository: traceloop/openllmetry
Length of output: 25138
🏁 Script executed:
#!/bin/bash
# Get the full SpanAttributes class definition
echo "=== Full SpanAttributes class definition ==="
sed -n '64,200p' packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.pyRepository: traceloop/openllmetry
Length of output: 7151
🏁 Script executed:
#!/bin/bash
# Search for LLM_PROMPTS definition line in the semantic conventions file
grep -n "LLM_PROMPTS" packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py | head -5
# Get the full content around that area
line_num=$(grep -n "LLM_PROMPTS" packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py | head -1 | cut -d: -f1)
if [ ! -z "$line_num" ]; then
echo "=== LLM_PROMPTS definition around line $line_num ==="
sed -n "$((line_num - 5)),$((line_num + 5))p" packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py
fiRepository: traceloop/openllmetry
Length of output: 47
🏁 Script executed:
#!/bin/bash
# Check the complete SpanAttributes class and its inheritance
echo "=== SpanAttributes class definition (full) ==="
sed -n '64,500p' packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py | grep -n "class SpanAttributes\|LLM_PROMPTS\|def __"
# Also check if there's a parent class or imports
echo -e "\n=== File header and imports ==="
head -70 packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py | tail -20
# Check how the test file imports SpanAttributes
echo -e "\n=== test_events.py imports ==="
head -20 packages/opentelemetry-instrumentation-openai-agents/tests/test_events.pyRepository: traceloop/openllmetry
Length of output: 2028
🏁 Script executed:
#!/bin/bash
# Get file size first
wc -l packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py
# Search for LLM_PROMPTS in the entire file
echo "=== Searching for LLM_PROMPTS in entire file ==="
grep -c "LLM_PROMPTS" packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py || echo "Not found (exit code: $?)"
# Get complete file and search manually
echo -e "\n=== Full semconv_ai file checking for all LLM_ attributes ==="
cat packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py | grep "LLM_" | head -20Repository: traceloop/openllmetry
Length of output: 1642
🏁 Script executed:
#!/bin/bash
# Get the entire file content and extract just the SpanAttributes class
python3 << 'EOF'
with open('packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py', 'r') as f:
content = f.read()
# Find where SpanAttributes class starts
start = content.find('class SpanAttributes:')
if start != -1:
# Get the rest of the file from that point
class_content = content[start:]
# Count attributes
lines = class_content.split('\n')
attributes = [line.strip() for line in lines if '=' in line and not line.strip().startswith('#')]
print(f"SpanAttributes class found at character position: {start}")
print(f"Total attribute assignments in class: {len(attributes)}")
print("\nSearching for LLM_PROMPTS in attributes:")
found = False
for i, attr in enumerate(attributes):
if 'PROMPTS' in attr.upper():
print(f" {i}: {attr}")
found = True
if not found:
print(" LLM_PROMPTS attribute NOT FOUND")
print("\nFirst 10 attributes:")
for attr in attributes[:10]:
print(f" {attr}")
print("\nLast 10 attributes:")
for attr in attributes[-10:]:
print(f" {attr}")
EOFRepository: traceloop/openllmetry
Length of output: 1038
🏁 Script executed:
#!/bin/bash
# Check if SpanAttributes has __getattr__ or other special methods
echo "=== Checking for special methods in SpanAttributes class ==="
grep -A 300 "class SpanAttributes:" packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py | grep "def __"
# Also check if there's a parent class
echo -e "\n=== SpanAttributes class declaration ==="
grep "class SpanAttributes" packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py
# Check if there are other files that might extend SpanAttributes
echo -e "\n=== Searching for other SpanAttributes definitions ==="
find packages -name "*.py" -type f -exec grep -l "class SpanAttributes" {} \;Repository: traceloop/openllmetry
Length of output: 329
🏁 Script executed:
#!/bin/bash
# Get the complete test function
sed -n '38,60p' packages/opentelemetry-instrumentation-openai-agents/tests/test_events.pyRepository: traceloop/openllmetry
Length of output: 938
🏁 Script executed:
#!/bin/bash
# Check what value LLM_PROMPTS should have based on usage pattern
echo "=== Checking the value pattern used with LLM_PROMPTS ==="
grep -h "SpanAttributes.LLM_PROMPTS" packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py | head -3
# Check if there's a GEN_AI_PROMPT or similar that should be LLM_PROMPTS
echo -e "\n=== Checking semantic conventions for PROMPT attributes ==="
grep -E "(PROMPT|prompt)" packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py | grep -v "#"Repository: traceloop/openllmetry
Length of output: 858
SpanAttributes.LLM_PROMPTS attribute does not exist and will cause AttributeError.
The test references SpanAttributes.LLM_PROMPTS (line 52), but this attribute is not defined in the SpanAttributes class in packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py. The class contains 177 attribute definitions but LLM_PROMPTS is not among them. This will raise an AttributeError at runtime when the test executes on line 52.
🤖 Prompt for AI Agents
In `@packages/opentelemetry-instrumentation-openai-agents/tests/test_events.py`
around lines 52 - 55, The test fails because SpanAttributes.LLM_PROMPTS is
referenced but not defined; add a new constant named LLM_PROMPTS to the
SpanAttributes class (opentelemetry.semconv_ai.SpanAttributes) with the
canonical attribute key (e.g., "llm.prompts") and ensure it is
exported/available from the module so tests can access
SpanAttributes.LLM_PROMPTS; keep the name and casing consistent with other
constants in the class and update any module exports or __all__ if needed.
- Add comment explaining opentelemetry._events is the official incubating Events API (not private) - Update vcr_config to scrub sensitive cookies and project IDs from cassette recordings - Add span_exporter.clear() to test fixtures to prevent span leakage between tests - Fix test_events.py to use GenAIAttributes.GEN_AI_PROMPT instead of non-existent SpanAttributes.LLM_PROMPTS
- Add type annotation for exception_logger (Optional[Any]) - Fix misleading comments: content filtering happens at call site, not in event emitter functions
- Replace cookies with <REDACTED_COOKIE> - Replace project IDs with <REDACTED_PROJECT_ID> - Replace organization names with <REDACTED_ORGANIZATION>
Summary
Add event emission support for OpenAI Agents instrumentation following OpenTelemetry GenAI semantic conventions.
Closes #3441
Changes
Backward Compatibility
use_legacy_attributes=Truepreserves existing span attribute behavioruse_legacy_attributes=Falseto enable event emission modeTRACELOOP_TRACE_CONTENTsetting for content redactionImportant
Adds event emission support for OpenAI Agents instrumentation with backward compatibility and comprehensive testing.
MessageEvent,ChoiceEvent,ToolStartEvent, andToolEndEventinevent_models.py._hooks.pywith backward compatibility.Configclass inconfig.pywithuse_legacy_attributesandevent_logger.use_legacy_attributes=Truemaintains existing span attribute behavior.use_legacy_attributes=Falseto enable event emission mode.TRACELOOP_TRACE_CONTENTfor content redaction.event_emitter.pyto handle event emission based on event type.should_emit_events()andshould_send_prompts()fromutils.py.test_events.pyfor both legacy and event emission modes.tests/cassettes/test_events/.This description was created by
for fb27b90. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.