-
Notifications
You must be signed in to change notification settings - Fork 293
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Memory Leak in ApiServerSpanExporter
## 🔴 Required Information
**Describe the Bug:**
The `ApiServerSpanExporter` class in `dev` modules has a memory leak. The `allExportedSpans` list continuously grows without any cleanup
mechanism, storing all exported spans indefinitely. As the application runs, this list accumulates spans indefinitely, leading to memory exhaustion (OOM)
over time.
**Steps to Reproduce:**
1. Start the ADK application with OpenTelemetry tracing enabled
2. Run the application for an extended period (several hours or days)
3. Create multiple agent sessions or execute LLM calls continuously
4. Monitor memory usage - it will continuously increase
**Expected Behavior:**
The application should maintain stable memory usage over time, with old trace data being cleaned up when it's no longer needed.
**Observed Behavior:**
Memory usage grows continuously until the application runs out of memory and crashes (OOM). The `allExportedSpans` list contains all historical spans
without any limit or cleanup strategy.
**Environment Details:**
- ADK Library Version: Any
- OS: Any (macOS, Linux, Windows)
- Java Version: Any
**Model Information:**
- Which model is being used: N/A (affects all models)
---
## 🟡 Optional Information
**Regression:**
Yes - This issue exists in all versions that use `ApiServerSpanExporter`
**Root Cause Analysis:**
The `ApiServerSpanExporter` class stores all exported spans in a `List<SpanData>` called `allExportedSpans`. This list:
- Is populated every time `export()` is called (which happens for every span created)
- Has no maximum size limit
- Has no time-based expiration
- Is never cleared, even when `shutdown()` is called
- Is accessed by the debug endpoint `/adk-dev/debug/trace/session/{sessionId}` to retrieve session traces
**Impact:**
- Memory leak in both production and development environments
- Increased GC pressure
- Potential OOM crashes in long-running applications
- Performance degradation due to growing list size
**Proposed Fix:**
Add capacity limit to `allExportedSpans` (e.g., keep only the most recent 10,000 spans) and clear storage in `shutdown()` method.
**Additional Context:**
The `sessionToTraceIdsMap` also has a similar issue, though less severe as it only stores session-to-trace-id mappings rather than full span data.
**Minimal Reproduction Code:**
```java
// Run any ADK application with tracing enabled for an extended period
// Monitor memory usage
```
**How often has this issue occurred?:**
- Always (100%) - This is a deterministic memory leak that will occur in all long-running applications
---
## 🟢 Fix Details
**Files Modified:**
- `dev/src/main/java/com/google/adk/web/service/ApiServerSpanExporter.java`
**Changes Made:**
1. Added `MAX_SPANS_TO_KEEP` constant (10,000 spans)
2. Modified `export()` method to remove oldest spans when limit is exceeded
3. Modified `shutdown()` method to clear all storage
**Implementation:**
```java
private static final int MAX_SPANS_TO_KEEP = 10000;
@Override
public CompletableResultCode export(Collection<SpanData> spans) {
exporterLog.debug("ApiServerSpanExporter received {} spans to export.", spans.size());
List<SpanData> currentBatch = new ArrayList<>(spans);
allExportedSpans.addAll(currentBatch);
// Prevent memory leak by keeping only the most recent spans
synchronized (allExportedSpans) {
while (allExportedSpans.size() > MAX_SPANS_TO_KEEP) {
allExportedSpans.remove(0);
}
}
// ... rest of method
}
@Override
public CompletableResultCode shutdown() {
exporterLog.debug("Shutting down ApiServerSpanExporter.");
// Clear all storage to prevent memory leaks
synchronized (allExportedSpans) {
allExportedSpans.clear();
}
eventIdTraceStorage.clear();
sessionToTraceIdsMap.clear();
return CompletableResultCode.ofSuccess();
}
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working