-
Notifications
You must be signed in to change notification settings - Fork 214
wip agent activity interruption detector integration #991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: lukas/barge-transport
Are you sure you want to change the base?
Conversation
🦋 Changeset detectedLatest commit: 157fcf5 The changes in this PR will be included in the next version bump. This PR includes changesets to release 18 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThis PR releases version 1.0.38 by consolidating changeset entries into official changelogs and introducing significant updates: new interruption detection integration with Changes
Sequence DiagramsequenceDiagram
participant Agent as Agent
participant AgentActivity as AgentActivity
participant AudioRec as AudioRecognition
participant IntDetector as AdaptiveInterruptionDetector
participant IntStream as InterruptionStream
participant STT as STT/VAD
Agent->>AgentActivity: init with turnHandling config
AgentActivity->>IntDetector: resolveInterruptionDetector()
IntDetector-->>AgentActivity: detector instance
AgentActivity->>AudioRec: start() + interruptionDetection
AudioRec->>IntStream: create interruption channel
STT->>AudioRec: onInputSpeechStarted()
AudioRec->>IntDetector: agent speech event
IntDetector->>IntStream: emit agentSpeechStarted sentinel
IntStream-->>AudioRec: interruption signal
STT->>AudioRec: onInterimTranscript()
AudioRec->>AudioRec: shouldHoldSttEvent? (check ignoreUserTranscript)
alt Interruption Active
AudioRec->>AudioRec: buffer transcript
else Normal Flow
AudioRec->>AgentActivity: onInterimTranscript()
end
IntDetector->>IntStream: overlap speech detected
IntStream-->>AudioRec: OverlapSpeechStarted
AudioRec->>AgentActivity: onInterruption(event)
AgentActivity->>AgentActivity: adjust interruption state
STT->>AudioRec: onInputSpeechStopped()
AudioRec->>IntDetector: agent speech end
IntDetector->>IntStream: emit agentSpeechEnded sentinel
AudioRec->>AudioRec: flushHeldTranscripts()
AudioRec->>AgentActivity: emit buffered transcripts
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes The PR introduces substantial logic across multiple interconnected systems: interruption detection integration in two key files (agent_activity.ts, audio_recognition.ts) with 381+ net new lines, public API changes (method renames, new optional parameters), constructor refactoring in agent.ts, Google realtime generation logic improvements, and version management across 20+ packages. The heterogeneity of concerns—interruption state machine, stream handling, configuration migration, and plugin-specific fixes—combined with moderate logic density in lifecycle event coordination requires careful cross-component reasoning. Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
…imed out waiting for generation_created" event (parity with Python plugin) (#989)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🤖 Fix all issues with AI agents
In `@agents/src/inference/interruption/InterruptionStream.ts`:
- Around line 50-55: The overlap-speech-started factory may emit undefined
speechDurationInS which later leads to NaN in the
Math.round(chunk.speechDurationInS * this.options.sampleRate) calculation;
update overlapSpeechStarted (and/or the consumer before multiplication) to
ensure speechDurationInS is a finite number (e.g., coalesce undefined to 0 or
validate with Number.isFinite) so the Math.round call always receives a numeric
value; specifically, adjust the overlapSpeechStarted return value or the code
that reads chunk.speechDurationInS so that chunk.speechDurationInS is a safe
numeric value before calling Math.round in the consumer that references
this.options.sampleRate.
In `@agents/src/inference/interruption/types.ts`:
- Around line 67-71: The optional fields on OverlapSpeechStarted
(speechDurationInS and userSpeakingSpan) are used without guards in
InterruptionStream.ts causing NaN in calculations; update the places that
compute Math.round(chunk.speechDurationInS * this.options.sampleRate) and
Date.now() - frame.speechDurationInS * 1000 to first check that
chunk.speechDurationInS / frame.speechDurationInS is not null/undefined (or
coerce to a safe default like 0) and handle the absent-case (e.g., early-return,
skip the branch, or use 0 as a fallback) so arithmetic only runs on defined
numeric values; locate references to OverlapSpeechStarted,
chunk.speechDurationInS, frame.speechDurationInS and this.options.sampleRate to
apply the guard.
In `@agents/src/stream/stream_channel.ts`:
- Around line 28-41: The addStreamInput async IIFE currently swallows errors and
writes without checking channel state; update the loop in addStreamInput (the
async IIFE using newInputStream.getReader() and await writer.write(value)) to
guard writes with a channel-closed check (e.g., if (writer.isClosed || isClosed)
break/skip) and wrap the read/write body in a try/catch so any error from
writer.write(...) or reader.read() is handled: on error release the reader, call
writer.abort(error) or writer.close() as appropriate, and surface/log the error
to avoid unhandled promise rejections; keep the existing finally block that
calls reader.releaseLock() so the reader is always released.
In `@agents/src/voice/agent_activity.ts`:
- Around line 1673-1678: The call to audioRecognition.onEndOfAgentSpeech uses an
object but the method signature expects a number; change the invocation inside
the isInterruptionDetectionEnabled block so that onEndOfAgentSpeech receives a
numeric timestamp (e.g., Date.now()) rather than an object, and keep the
subsequent call to restoreInterruptionByAudioActivity unchanged; update the call
site where audioRecognition.onEndOfAgentSpeech is invoked to pass the number
directly.
- Around line 1374-1378: The call to audioRecognition.OnEndOfAgentSpeech is
using the wrong case and will throw at runtime; change the invocation to
audioRecognition.onEndOfAgentSpeech(Date.now()) where the audioRecognition
object is used (within the block guarded by isInterruptionDetectionEnabled) and
keep the surrounding logic (including restoreInterruptionByAudioActivity) intact
so the only change is the method name casing.
In `@agents/src/voice/agent.ts`:
- Around line 128-130: The code ignores the legacy turnDetection option because
you always read from this.turnHandling; fix by preferring an explicit legacy
option when present: in the constructor where this.turnHandling and
this.turnDetection are set, check the incoming options.turnDetection (or the
destructured turnDetection param) and assign this.turnDetection =
options.turnDetection ?? this.turnHandling?.turnDetection (or if you have a
turnConfig sentinel, only migrate legacy when turnConfig is unset), ensuring the
legacy value is applied when provided; update the assignment site around
this.turnHandling / this.turnDetection to perform this conditional migration.
- Around line 151-154: The code risks a TypeError when turnHandling exists but
interruption is undefined; change references to use optional chaining on
interruption (e.g., read this.turnHandling?.interruption?.mode into
this._interruptionDetection) and update the check/assignment for
this._allowInterruptions to use the same safe chain (e.g., set
this._allowInterruptions = !!this.turnHandling?.interruption?.mode) so you never
access interruption.mode directly; update both occurrences (the assignment to
this._interruptionDetection and the conditional that sets
this._allowInterruptions) to use interruption?.
In `@agents/src/voice/audio_recognition.ts`:
- Around line 279-280: The current condition for building eventsToEmit
incorrectly treats emitFromIndex === 0 as falsy; update the check in
audio_recognition.ts so eventsToEmit is populated when emitFromIndex is a valid
numeric index (including 0) and shouldFlush is true — e.g., replace the truthy
check with an explicit numeric/index validation (like typeof emitFromIndex ===
'number' && emitFromIndex >= 0 && shouldFlush) before calling
this.transcriptBuffer.slice(emitFromIndex).
- Around line 370-371: The condition else if (this.transcriptBuffer) is always
truthy because transcriptBuffer is an array; change the branch to check for
content by using this.transcriptBuffer.length > 0 (or a safe check like
Array.isArray(this.transcriptBuffer) && this.transcriptBuffer.length > 0) before
calling await this.flushHeldTranscripts(); update the condition in the method
that contains transcriptBuffer and flushHeldTranscripts to use this length check
so flushHeldTranscripts() only runs when there are held transcripts.
- Around line 218-227: The guard in onEndOfOverlapSpeech is inverted: change the
early return so the function only bails out when interruption detection is
disabled (i.e. replace the current if (this.isInterruptionEnabled) return; with
if (!this.isInterruptionEnabled) return;), so that
trySendInterruptionSentinel(InterruptionStreamSentinel.overlapSpeechEnded()) is
invoked when interruptions are enabled; keep the existing user speaking span
handling (userSpeakingSpan.isRecording() /
setAttribute(traceTypes.ATTR_IS_INTERRUPTION, 'false')) as-is so it still runs
before sending the sentinel.
🧹 Nitpick comments (1)
agents/src/voice/agent_activity.ts (1)
2453-2496: Complex negated condition inresolveInterruptionDetector.The condition at lines 2456-2465 is hard to follow due to the outer negation. Consider restructuring for clarity.
♻️ Suggested refactor for readability
+ private isInterruptionDetectionCompatible(): boolean { + return ( + !!this.stt && + this.stt.capabilities.alignedTranscript && + this.stt.capabilities.streaming && + !!this.vad && + this.turnDetection !== 'manual' && + this.turnDetection !== 'realtime_llm' && + !(this.llm instanceof RealtimeModel) + ); + } private resolveInterruptionDetector(): AdaptiveInterruptionDetector | undefined { const interruptionDetection = this.agent.interruptionDetection ?? this.agentSession.interruptionDetection; - if ( - !( - this.stt && - this.stt.capabilities.alignedTranscript && - this.stt.capabilities.streaming && - this.vad && - this.turnDetection !== 'manual' && - this.turnDetection !== 'realtime_llm' && - !(this.llm instanceof RealtimeModel) - ) - ) { + if (!this.isInterruptionDetectionCompatible()) {
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (44)
.changeset/large-cars-pull.md.changeset/lazy-spies-worry.md.changeset/shy-rockets-fry.md.changeset/tender-glasses-burn.mdagents/CHANGELOG.mdagents/package.jsonagents/src/inference/interruption/InterruptionStream.tsagents/src/inference/interruption/types.tsagents/src/stream/stream_channel.tsagents/src/voice/agent.tsagents/src/voice/agent_activity.tsagents/src/voice/audio_recognition.tsplugins/anam/CHANGELOG.mdplugins/anam/package.jsonplugins/baseten/package.jsonplugins/bey/CHANGELOG.mdplugins/bey/package.jsonplugins/cartesia/CHANGELOG.mdplugins/cartesia/package.jsonplugins/deepgram/CHANGELOG.mdplugins/deepgram/package.jsonplugins/elevenlabs/CHANGELOG.mdplugins/elevenlabs/package.jsonplugins/google/CHANGELOG.mdplugins/google/package.jsonplugins/google/src/beta/realtime/realtime_api.tsplugins/hedra/package.jsonplugins/inworld/package.jsonplugins/livekit/CHANGELOG.mdplugins/livekit/package.jsonplugins/neuphonic/CHANGELOG.mdplugins/neuphonic/package.jsonplugins/openai/CHANGELOG.mdplugins/openai/package.jsonplugins/resemble/CHANGELOG.mdplugins/resemble/package.jsonplugins/rime/CHANGELOG.mdplugins/rime/package.jsonplugins/silero/CHANGELOG.mdplugins/silero/package.jsonplugins/test/CHANGELOG.mdplugins/test/package.jsonplugins/xai/package.jsonturbo.json
💤 Files with no reviewable changes (3)
- .changeset/large-cars-pull.md
- .changeset/lazy-spies-worry.md
- .changeset/tender-glasses-burn.md
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'
Files:
agents/src/inference/interruption/types.tsagents/src/stream/stream_channel.tsplugins/google/src/beta/realtime/realtime_api.tsagents/src/voice/audio_recognition.tsagents/src/inference/interruption/InterruptionStream.tsagents/src/voice/agent_activity.tsagents/src/voice/agent.ts
**/*.{ts,tsx}?(test|example|spec)
📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
When testing inference LLM, always use full model names from
agents/src/inference/models.ts(e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')
Files:
agents/src/inference/interruption/types.tsagents/src/stream/stream_channel.tsplugins/google/src/beta/realtime/realtime_api.tsagents/src/voice/audio_recognition.tsagents/src/inference/interruption/InterruptionStream.tsagents/src/voice/agent_activity.tsagents/src/voice/agent.ts
**/*.{ts,tsx}?(test|example)
📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Initialize logger before using any LLM functionality with
initializeLogger({ pretty: true })from '@livekit/agents'
Files:
agents/src/inference/interruption/types.tsagents/src/stream/stream_channel.tsplugins/google/src/beta/realtime/realtime_api.tsagents/src/voice/audio_recognition.tsagents/src/inference/interruption/InterruptionStream.tsagents/src/voice/agent_activity.tsagents/src/voice/agent.ts
🧬 Code graph analysis (3)
agents/src/voice/audio_recognition.ts (6)
agents/src/inference/interruption/types.ts (2)
InterruptionEvent(17-28)InterruptionSentinel(84-89)agents/src/inference/interruption/AdaptiveInterruptionDetector.ts (1)
AdaptiveInterruptionDetector(26-192)agents/src/stt/stt.ts (1)
SpeechEvent(65-70)agents/src/stream/stream_channel.ts (2)
StreamChannel(7-14)createStreamChannel(16-60)agents/src/inference/interruption/InterruptionStream.ts (2)
InterruptionSentinel(36-36)InterruptionStreamSentinel(41-64)agents/src/stt/index.ts (1)
SpeechEventType(11-11)
agents/src/inference/interruption/InterruptionStream.ts (1)
agents/src/inference/interruption/types.ts (2)
AgentSpeechStarted(59-61)AgentSpeechEnded(63-65)
agents/src/voice/agent.ts (3)
agents/src/voice/agent_session.ts (1)
TurnDetectionMode(90-90)agents/src/voice/turn_config/turnHandling.ts (1)
TurnHandlingConfig(9-46)agents/src/voice/turn_config/interruption.ts (1)
InterruptionConfig(4-37)
🔇 Additional comments (47)
plugins/cartesia/CHANGELOG.md (1)
3-9: Changelog entry looks good.
Clear and consistent with prior version sections.plugins/hedra/package.json (1)
3-3: Version bump looks good.No concerns with this change.
plugins/rime/CHANGELOG.md (1)
3-8: Changelog entry looks good.Clear, consistent dependency bump entry for 1.0.38.
plugins/silero/CHANGELOG.md (1)
3-9: LGTM! Changelog entry follows standard format.The new version 1.0.38 entry is properly formatted and consistent with previous changelog entries. The dependency update to
@livekit/agents@1.0.38 aligns with the broader version bump across the monorepo.plugins/bey/CHANGELOG.md (1)
3-9: The changelog entry is correct and complete. The bey plugin has no code changes—only a dependency update to@livekit/agents@1.0.38. The format matches the historical pattern across all previous versions (1.0.37, 1.0.36, 1.0.35), which similarly document dependency-only updates. The interruption detection features mentioned in the PR title and AI summary are changes in the core@livekit/agentspackage itself, not the bey plugin, so the changelog correctly reflects only the dependency update.Likely an incorrect or invalid review comment.
plugins/rime/package.json (1)
3-3: LGTM!Version bump to 1.0.38 aligns with the coordinated patch release across the monorepo.
plugins/inworld/package.json (1)
3-3: LGTM!Version bump to 1.0.38 is consistent with the coordinated release.
agents/package.json (1)
3-3: LGTM!Core agents package version bump to 1.0.38 aligns with the new interruption detection integration and related features in this PR.
plugins/bey/package.json (1)
3-3: LGTM!Version bump to 1.0.38 is consistent with the coordinated release.
plugins/elevenlabs/package.json (1)
3-3: LGTM!Version bump to 1.0.38 is consistent with the coordinated release.
plugins/silero/package.json (1)
3-3: LGTM!Version bump to 1.0.38 is consistent with the coordinated release.
plugins/cartesia/package.json (1)
3-3: LGTM!Version bump to 1.0.38 aligns with the coordinated release across the monorepo.
plugins/baseten/package.json (1)
3-3: LGTM!Version bump to 1.0.38 is consistent with the release.
plugins/google/package.json (1)
3-3: LGTM!Version bump to 1.0.38 aligns with the release and accompanying changelog entry documenting the realtime generation sequencing fix.
plugins/xai/package.json (1)
3-3: LGTM!Version bump to 1.0.38 is consistent with the coordinated release.
plugins/livekit/CHANGELOG.md (1)
3-8: LGTM!Changelog entry for 1.0.38 follows the established format and properly documents the dependency update to
@livekit/agents@1.0.38 with linked commit references.plugins/resemble/CHANGELOG.md (1)
3-8: LGTM!Changelog entry for 1.0.38 is properly formatted and documents the dependency update to
@livekit/agents@1.0.38, consistent with other plugin changelog entries in this release.plugins/openai/package.json (1)
3-3: Version bump looks consistent. Line 3 cleanly updates the package version to 1.0.38.plugins/google/CHANGELOG.md (1)
3-11: Changelog entry is clear and release-ready. Lines 3-11 document the patch update and dependency bump concisely.plugins/resemble/package.json (1)
3-3: Version bump looks consistent. Line 3 cleanly updates the package version to 1.0.38.plugins/livekit/package.json (1)
3-3: Version bump looks consistent. Line 3 cleanly updates the package version to 1.0.38.plugins/deepgram/package.json (1)
3-3: Version bump looks consistent. Line 3 cleanly updates the package version to 1.0.38.plugins/neuphonic/package.json (1)
3-3: Version bump looks consistent. Line 3 cleanly updates the package version to 1.0.38.plugins/anam/package.json (1)
3-3: Version bump looks good.
No concerns with the patch release update.plugins/neuphonic/CHANGELOG.md (1)
3-9: Changelog entry is consistent.
The patch notes align with the dependency bump.agents/CHANGELOG.md (1)
3-11: Changelog update looks good.
Patch notes are clear and well-scoped.plugins/openai/CHANGELOG.md (1)
3-10: Changelog entry is aligned.
No issues with the patch notes or dependency update.plugins/deepgram/CHANGELOG.md (1)
3-10: Changelog entry looks good.
No concerns with this patch update.turbo.json (1)
56-56: Global env addition is fine.
No issues with addingLK_GOOGLE_DEBUGto the shared environment list.plugins/elevenlabs/CHANGELOG.md (1)
3-8: LGTM!Standard changeset-generated changelog entry for version 1.0.38 with dependency update to
@livekit/agents@1.0.38. Format is consistent with previous entries.plugins/test/CHANGELOG.md (1)
3-8: LGTM!Changelog entry for version 1.0.38 follows the established pattern with updated dependency references.
plugins/test/package.json (1)
3-3: LGTM!Version bump to 1.0.38 aligns with the changelog entry and the broader package release.
plugins/anam/CHANGELOG.md (1)
3-8: LGTM!Changelog entry for version 1.0.38 with dependency update follows the established format.
agents/src/stream/stream_channel.ts (1)
13-13: Interface addition looks good.The new
addStreamInputmethod enables dynamically piping additional input streams into the channel, which supports the interruption detection integration mentioned in the PR objectives.agents/src/inference/interruption/InterruptionStream.ts (1)
42-48: LGTM: Method rename improves clarity.Renaming
speechStarted()toagentSpeechStarted()andspeechEnded()toagentSpeechEnded()better conveys the intent and aligns with the agent-facing interruption API.plugins/google/src/beta/realtime/realtime_api.ts (4)
46-47: LGTM: Debug flag for verbose logging.The
LK_GOOGLE_DEBUGenvironment variable provides a clean way to enable verbose client/server logging without flooding production logs.
961-1000: Generation lifecycle refactoring looks correct.The new logic properly handles:
- Interrupted events without active generation
- Clearing empty server content
- Starting new generation based on
isNewGenerationhelperThe separation of concerns with
isNewGeneration()improves readability.
1370-1376: Verify interaction betweeninterruptedhandling andpendingGenerationFut.At line 1370,
handleInputSpeechStarted()is called whenserverContent.interruptedis true and there's nopendingGenerationFut. However, this same condition is also checked earlier at line 969-971 withinshouldStartNewGenerationblock. Confirm that callinghandleInputSpeechStarted()twice in certain edge cases is intentional/idempotent.
1531-1545: LGTM: Clean helper method for generation determination.The
isNewGenerationmethod properly checks fortoolCallor meaningfulserverContentfields to determine if a new generation should start.agents/src/voice/agent.ts (1)
195-201: LGTM: New getters for interruption configuration.The
interruptionDetectionandallowInterruptionsgetters provide clean read-only access to the interruption configuration state.agents/src/voice/audio_recognition.ts (2)
111-118: LGTM: Interruption detection state initialization.The new fields properly track interruption detection state including the stream channel, buffer, and agent speaking status.
546-550: LGTM: Interruption event routing.The
onInterruptionEventmethod correctly routes interruption events to the recognition hooks.agents/src/voice/agent_activity.ts (4)
110-113: LGTM: Interruption detection state flags.The new flags properly track interruption detection enablement and audio activity interruption state.
406-424: LGTM: ExtendedupdateOptionswithturnDetection.The method signature is appropriately extended to accept
turnDetection, and the default interruption behavior is updated accordingly.
753-759: LGTM:onInterruptionhandler properly chains behaviors.The handler correctly restores audio activity interruption, triggers interruption, and notifies audio recognition.
2171-2175: LGTM: Agent state transition after tool execution.The state properly transitions to 'thinking' when tool outputs exist, or 'listening' when speaking completes without tool calls.
.changeset/shy-rockets-fry.md (1)
1-5: LGTM: Changeset accurately describes the PR scope.The patch version bump is appropriate for the state transition fixes and new interruption support features.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| static overlapSpeechStarted( | ||
| speechDurationInS: number, | ||
| userSpeakingSpan: Span, | ||
| speechDurationInS?: number, | ||
| userSpeakingSpan?: Span, | ||
| ): OverlapSpeechStarted { | ||
| return { type: 'overlap-speech-started', speechDurationInS, userSpeakingSpan }; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential NaN when speechDurationInS is undefined.
When speechDurationInS is undefined, the calculation at line 244 (Math.round(chunk.speechDurationInS * this.options.sampleRate)) will produce NaN, causing incorrect buffer handling.
🐛 Proposed fix
} else if (chunk.type === 'overlap-speech-started' && agentSpeechStarted) {
this.userSpeakingSpan = chunk.userSpeakingSpan;
this.logger.debug('overlap speech started, starting interruption inference');
overlapSpeechStarted = true;
accumulatedSamples = 0;
// Include both speech duration and audio prefix duration for context
+ const speechDuration = chunk.speechDurationInS ?? 0;
const shiftSize = Math.min(
startIdx,
- Math.round(chunk.speechDurationInS * this.options.sampleRate) +
+ Math.round(speechDuration * this.options.sampleRate) +
Math.round(this.options.audioPrefixDurationInS * this.options.sampleRate),
);🤖 Prompt for AI Agents
In `@agents/src/inference/interruption/InterruptionStream.ts` around lines 50 -
55, The overlap-speech-started factory may emit undefined speechDurationInS
which later leads to NaN in the Math.round(chunk.speechDurationInS *
this.options.sampleRate) calculation; update overlapSpeechStarted (and/or the
consumer before multiplication) to ensure speechDurationInS is a finite number
(e.g., coalesce undefined to 0 or validate with Number.isFinite) so the
Math.round call always receives a numeric value; specifically, adjust the
overlapSpeechStarted return value or the code that reads chunk.speechDurationInS
so that chunk.speechDurationInS is a safe numeric value before calling
Math.round in the consumer that references this.options.sampleRate.
| export interface OverlapSpeechStarted { | ||
| type: 'overlap-speech-started'; | ||
| speechDurationInS: number; | ||
| userSpeakingSpan: Span; | ||
| speechDurationInS?: number; | ||
| userSpeakingSpan?: Span; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for usages of OverlapSpeechStarted and check how speechDurationInS and userSpeakingSpan are accessed
rg -n -A5 -B2 'OverlapSpeechStarted' --type=ts
echo "---"
# Check for direct property access that might fail with undefined
rg -n '\.speechDurationInS|\.userSpeakingSpan' --type=ts -C3Repository: livekit/agents-js
Length of output: 11029
🏁 Script executed:
#!/bin/bash
# Check context around the problematic lines to confirm they lack null checks
sed -n '240,250p' agents/src/inference/interruption/InterruptionStream.ts
echo "---"
sed -n '343,350p' agents/src/inference/interruption/InterruptionStream.tsRepository: livekit/agents-js
Length of output: 1146
🏁 Script executed:
#!/bin/bash
# Check if there are guards before lines 244 and 346 that ensure speechDurationInS is defined
sed -n '236,246p' agents/src/inference/interruption/InterruptionStream.ts
echo "---"
# Check context for line 346
sed -n '340,350p' agents/src/inference/interruption/InterruptionStream.ts
echo "---"
# Search for any guards on overlapSpeechStarted or speechDurationInS
rg -B10 'chunk.speechDurationInS' agents/src/inference/interruption/InterruptionStream.ts | head -30Repository: livekit/agents-js
Length of output: 1774
Fix unsafe accesses to optional speechDurationInS field before using in calculations.
Making speechDurationInS and userSpeakingSpan optional in OverlapSpeechStarted introduces runtime bugs in the consumer code. Two places access speechDurationInS without null checks:
- Line 244 in
InterruptionStream.ts:Math.round(chunk.speechDurationInS * this.options.sampleRate)will returnNaNifspeechDurationInSis undefined - Line 346 in
InterruptionStream.ts:Date.now() - frame.speechDurationInS * 1000will returnNaNifspeechDurationInSis undefined
Both branches only check the message type ('overlap-speech-started') but not whether the optional field is defined. Add null checks before arithmetic operations or provide default values.
🤖 Prompt for AI Agents
In `@agents/src/inference/interruption/types.ts` around lines 67 - 71, The
optional fields on OverlapSpeechStarted (speechDurationInS and userSpeakingSpan)
are used without guards in InterruptionStream.ts causing NaN in calculations;
update the places that compute Math.round(chunk.speechDurationInS *
this.options.sampleRate) and Date.now() - frame.speechDurationInS * 1000 to
first check that chunk.speechDurationInS / frame.speechDurationInS is not
null/undefined (or coerce to a safe default like 0) and handle the absent-case
(e.g., early-return, skip the branch, or use 0 as a fallback) so arithmetic only
runs on defined numeric values; locate references to OverlapSpeechStarted,
chunk.speechDurationInS, frame.speechDurationInS and this.options.sampleRate to
apply the guard.
| addStreamInput: (newInputStream) => { | ||
| const reader = newInputStream.getReader(); | ||
| (async () => { | ||
| try { | ||
| while (true) { | ||
| const { done, value } = await reader.read(); | ||
| if (done) break; | ||
| await writer.write(value); | ||
| } | ||
| } finally { | ||
| reader.releaseLock(); | ||
| } | ||
| })(); | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for the async read loop.
The fire-and-forget async IIFE silently swallows errors. If writer.write() fails (e.g., channel aborted or closed), there's no error handling, which could lead to unhandled promise rejections or silent failures.
Additionally, there's no check for isClosed before writing, so attempts to write to a closed channel will throw.
🐛 Proposed fix with error handling
addStreamInput: (newInputStream) => {
const reader = newInputStream.getReader();
(async () => {
try {
while (true) {
+ if (isClosed) break;
const { done, value } = await reader.read();
if (done) break;
await writer.write(value);
}
+ } catch (e) {
+ // Stream was likely closed or aborted; suppress expected errors
+ if (!(e instanceof Error && e.name === 'TypeError')) {
+ console.error('Error in addStreamInput:', e);
+ }
} finally {
reader.releaseLock();
}
})();
},🤖 Prompt for AI Agents
In `@agents/src/stream/stream_channel.ts` around lines 28 - 41, The addStreamInput
async IIFE currently swallows errors and writes without checking channel state;
update the loop in addStreamInput (the async IIFE using
newInputStream.getReader() and await writer.write(value)) to guard writes with a
channel-closed check (e.g., if (writer.isClosed || isClosed) break/skip) and
wrap the read/write body in a try/catch so any error from writer.write(...) or
reader.read() is handled: on error release the reader, call writer.abort(error)
or writer.close() as appropriate, and surface/log the error to avoid unhandled
promise rejections; keep the existing finally block that calls
reader.releaseLock() so the reader is always released.
| if (this.isInterruptionDetectionEnabled && this.audioRecognition) { | ||
| this.audioRecognition.OnEndOfAgentSpeech(Date.now()); | ||
| } | ||
| this.restoreInterruptionByAudioActivity(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in method call: OnEndOfAgentSpeech should be onEndOfAgentSpeech.
The method is called with a capital 'O' which will cause a runtime error since JavaScript is case-sensitive.
🐛 Proposed fix
this.agentSession._updateAgentState('listening');
if (this.isInterruptionDetectionEnabled && this.audioRecognition) {
- this.audioRecognition.OnEndOfAgentSpeech(Date.now());
+ this.audioRecognition.onEndOfAgentSpeech(Date.now());
}
this.restoreInterruptionByAudioActivity();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (this.isInterruptionDetectionEnabled && this.audioRecognition) { | |
| this.audioRecognition.OnEndOfAgentSpeech(Date.now()); | |
| } | |
| this.restoreInterruptionByAudioActivity(); | |
| } | |
| if (this.isInterruptionDetectionEnabled && this.audioRecognition) { | |
| this.audioRecognition.onEndOfAgentSpeech(Date.now()); | |
| } | |
| this.restoreInterruptionByAudioActivity(); |
🤖 Prompt for AI Agents
In `@agents/src/voice/agent_activity.ts` around lines 1374 - 1378, The call to
audioRecognition.OnEndOfAgentSpeech is using the wrong case and will throw at
runtime; change the invocation to
audioRecognition.onEndOfAgentSpeech(Date.now()) where the audioRecognition
object is used (within the block guarded by isInterruptionDetectionEnabled) and
keep the surrounding logic (including restoreInterruptionByAudioActivity) intact
so the only change is the method name casing.
| if (this.isInterruptionDetectionEnabled && this.audioRecognition) { | ||
| { | ||
| this.audioRecognition.onEndOfAgentSpeech({ ignoreUserTranscriptUntil: Date.now() }); | ||
| this.restoreInterruptionByAudioActivity(); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent method signature: passes object instead of number.
At line 1675, onEndOfAgentSpeech is called with an object { ignoreUserTranscriptUntil: Date.now() }, but the method signature in audio_recognition.ts (line 181) expects a number: onEndOfAgentSpeech(ignoreUserTranscriptUntil: number).
🐛 Proposed fix
if (this.isInterruptionDetectionEnabled && this.audioRecognition) {
- {
- this.audioRecognition.onEndOfAgentSpeech({ ignoreUserTranscriptUntil: Date.now() });
- this.restoreInterruptionByAudioActivity();
- }
+ this.audioRecognition.onEndOfAgentSpeech(Date.now());
+ this.restoreInterruptionByAudioActivity();
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (this.isInterruptionDetectionEnabled && this.audioRecognition) { | |
| { | |
| this.audioRecognition.onEndOfAgentSpeech({ ignoreUserTranscriptUntil: Date.now() }); | |
| this.restoreInterruptionByAudioActivity(); | |
| } | |
| } | |
| if (this.isInterruptionDetectionEnabled && this.audioRecognition) { | |
| this.audioRecognition.onEndOfAgentSpeech(Date.now()); | |
| this.restoreInterruptionByAudioActivity(); | |
| } |
🤖 Prompt for AI Agents
In `@agents/src/voice/agent_activity.ts` around lines 1673 - 1678, The call to
audioRecognition.onEndOfAgentSpeech uses an object but the method signature
expects a number; change the invocation inside the
isInterruptionDetectionEnabled block so that onEndOfAgentSpeech receives a
numeric timestamp (e.g., Date.now()) rather than an object, and keep the
subsequent call to restoreInterruptionByAudioActivity unchanged; update the call
site where audioRecognition.onEndOfAgentSpeech is invoked to pass the number
directly.
| this.turnHandling = turnHandling; // TODO migrate legacy options to new turn handling config when turnConfig is unset | ||
|
|
||
| this.turnDetection = this.turnHandling?.turnDetection; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Legacy turnDetection option is ignored.
The turnDetection parameter is destructured from options but never used. Line 130 only reads from turnHandling?.turnDetection, so the deprecated turnDetection option has no effect.
🐛 Proposed fix to migrate legacy option
this.turnHandling = turnHandling; // TODO migrate legacy options to new turn handling config when turnConfig is unset
- this.turnDetection = this.turnHandling?.turnDetection;
+ this.turnDetection = this.turnHandling?.turnDetection ?? turnDetection;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| this.turnHandling = turnHandling; // TODO migrate legacy options to new turn handling config when turnConfig is unset | |
| this.turnDetection = this.turnHandling?.turnDetection; | |
| this.turnHandling = turnHandling; // TODO migrate legacy options to new turn handling config when turnConfig is unset | |
| this.turnDetection = this.turnHandling?.turnDetection ?? turnDetection; |
🤖 Prompt for AI Agents
In `@agents/src/voice/agent.ts` around lines 128 - 130, The code ignores the
legacy turnDetection option because you always read from this.turnHandling; fix
by preferring an explicit legacy option when present: in the constructor where
this.turnHandling and this.turnDetection are set, check the incoming
options.turnDetection (or the destructured turnDetection param) and assign
this.turnDetection = options.turnDetection ?? this.turnHandling?.turnDetection
(or if you have a turnConfig sentinel, only migrate legacy when turnConfig is
unset), ensuring the legacy value is applied when provided; update the
assignment site around this.turnHandling / this.turnDetection to perform this
conditional migration.
| this._interruptionDetection = this.turnHandling?.interruption.mode; | ||
| if (this.turnHandling?.interruption.mode !== undefined) { | ||
| this._allowInterruptions = !!this.turnHandling.interruption.mode; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential runtime error when turnHandling exists but interruption is undefined.
Accessing turnHandling?.interruption.mode will throw a TypeError if turnHandling is defined but interruption is not. Use optional chaining on interruption as well.
🐛 Proposed fix
- this._interruptionDetection = this.turnHandling?.interruption.mode;
- if (this.turnHandling?.interruption.mode !== undefined) {
- this._allowInterruptions = !!this.turnHandling.interruption.mode;
+ this._interruptionDetection = this.turnHandling?.interruption?.mode;
+ if (this.turnHandling?.interruption?.mode !== undefined) {
+ this._allowInterruptions = !!this.turnHandling.interruption.mode;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| this._interruptionDetection = this.turnHandling?.interruption.mode; | |
| if (this.turnHandling?.interruption.mode !== undefined) { | |
| this._allowInterruptions = !!this.turnHandling.interruption.mode; | |
| } | |
| this._interruptionDetection = this.turnHandling?.interruption?.mode; | |
| if (this.turnHandling?.interruption?.mode !== undefined) { | |
| this._allowInterruptions = !!this.turnHandling.interruption?.mode; | |
| } |
🤖 Prompt for AI Agents
In `@agents/src/voice/agent.ts` around lines 151 - 154, The code risks a TypeError
when turnHandling exists but interruption is undefined; change references to use
optional chaining on interruption (e.g., read
this.turnHandling?.interruption?.mode into this._interruptionDetection) and
update the check/assignment for this._allowInterruptions to use the same safe
chain (e.g., set this._allowInterruptions =
!!this.turnHandling?.interruption?.mode) so you never access interruption.mode
directly; update both occurrences (the assignment to this._interruptionDetection
and the conditional that sets this._allowInterruptions) to use interruption?.
| async onEndOfOverlapSpeech(userSpeakingSpan?: Span) { | ||
| if (this.isInterruptionEnabled) { | ||
| return; | ||
| } | ||
| if (userSpeakingSpan && userSpeakingSpan.isRecording()) { | ||
| userSpeakingSpan.setAttribute(traceTypes.ATTR_IS_INTERRUPTION, 'false'); | ||
| } | ||
|
|
||
| return this.trySendInterruptionSentinel(InterruptionStreamSentinel.overlapSpeechEnded()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic appears inverted: returns early when interruption IS enabled.
onEndOfOverlapSpeech returns early (line 220) when isInterruptionEnabled is true, meaning the sentinel is only sent when interruption is disabled. This seems backwards—the sentinel should be sent when interruption detection is enabled.
🐛 Proposed fix
async onEndOfOverlapSpeech(userSpeakingSpan?: Span) {
- if (this.isInterruptionEnabled) {
+ if (!this.isInterruptionEnabled) {
return;
}
if (userSpeakingSpan && userSpeakingSpan.isRecording()) {
userSpeakingSpan.setAttribute(traceTypes.ATTR_IS_INTERRUPTION, 'false');
}
return this.trySendInterruptionSentinel(InterruptionStreamSentinel.overlapSpeechEnded());
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async onEndOfOverlapSpeech(userSpeakingSpan?: Span) { | |
| if (this.isInterruptionEnabled) { | |
| return; | |
| } | |
| if (userSpeakingSpan && userSpeakingSpan.isRecording()) { | |
| userSpeakingSpan.setAttribute(traceTypes.ATTR_IS_INTERRUPTION, 'false'); | |
| } | |
| return this.trySendInterruptionSentinel(InterruptionStreamSentinel.overlapSpeechEnded()); | |
| } | |
| async onEndOfOverlapSpeech(userSpeakingSpan?: Span) { | |
| if (!this.isInterruptionEnabled) { | |
| return; | |
| } | |
| if (userSpeakingSpan && userSpeakingSpan.isRecording()) { | |
| userSpeakingSpan.setAttribute(traceTypes.ATTR_IS_INTERRUPTION, 'false'); | |
| } | |
| return this.trySendInterruptionSentinel(InterruptionStreamSentinel.overlapSpeechEnded()); | |
| } |
🤖 Prompt for AI Agents
In `@agents/src/voice/audio_recognition.ts` around lines 218 - 227, The guard in
onEndOfOverlapSpeech is inverted: change the early return so the function only
bails out when interruption detection is disabled (i.e. replace the current if
(this.isInterruptionEnabled) return; with if (!this.isInterruptionEnabled)
return;), so that
trySendInterruptionSentinel(InterruptionStreamSentinel.overlapSpeechEnded()) is
invoked when interruptions are enabled; keep the existing user speaking span
handling (userSpeakingSpan.isRecording() /
setAttribute(traceTypes.ATTR_IS_INTERRUPTION, 'false')) as-is so it still runs
before sending the sentinel.
| const eventsToEmit = | ||
| emitFromIndex && shouldFlush ? this.transcriptBuffer.slice(emitFromIndex) : []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Index 0 is falsy: emitFromIndex && shouldFlush fails when emitFromIndex === 0.
When emitFromIndex is 0, the condition emitFromIndex && shouldFlush evaluates to false, incorrectly skipping events that should be emitted.
🐛 Proposed fix
const eventsToEmit =
- emitFromIndex && shouldFlush ? this.transcriptBuffer.slice(emitFromIndex) : [];
+ emitFromIndex !== null && shouldFlush ? this.transcriptBuffer.slice(emitFromIndex) : [];📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const eventsToEmit = | |
| emitFromIndex && shouldFlush ? this.transcriptBuffer.slice(emitFromIndex) : []; | |
| const eventsToEmit = | |
| emitFromIndex !== null && shouldFlush ? this.transcriptBuffer.slice(emitFromIndex) : []; |
🤖 Prompt for AI Agents
In `@agents/src/voice/audio_recognition.ts` around lines 279 - 280, The current
condition for building eventsToEmit incorrectly treats emitFromIndex === 0 as
falsy; update the check in audio_recognition.ts so eventsToEmit is populated
when emitFromIndex is a valid numeric index (including 0) and shouldFlush is
true — e.g., replace the truthy check with an explicit numeric/index validation
(like typeof emitFromIndex === 'number' && emitFromIndex >= 0 && shouldFlush)
before calling this.transcriptBuffer.slice(emitFromIndex).
| } else if (this.transcriptBuffer) { | ||
| await this.flushHeldTranscripts(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Condition always truthy: this.transcriptBuffer is an array.
The check else if (this.transcriptBuffer) is always truthy since transcriptBuffer is initialized as an array. This should check transcriptBuffer.length > 0.
🐛 Proposed fix
- } else if (this.transcriptBuffer) {
+ } else if (this.transcriptBuffer.length > 0) {
await this.flushHeldTranscripts();
// no return here to allow the new event to be processed normally
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } else if (this.transcriptBuffer) { | |
| await this.flushHeldTranscripts(); | |
| } else if (this.transcriptBuffer.length > 0) { | |
| await this.flushHeldTranscripts(); |
🤖 Prompt for AI Agents
In `@agents/src/voice/audio_recognition.ts` around lines 370 - 371, The condition
else if (this.transcriptBuffer) is always truthy because transcriptBuffer is an
array; change the branch to check for content by using
this.transcriptBuffer.length > 0 (or a safe check like
Array.isArray(this.transcriptBuffer) && this.transcriptBuffer.length > 0) before
calling await this.flushHeldTranscripts(); update the condition in the method
that contains transcriptBuffer and flushHeldTranscripts to use this length check
so flushHeldTranscripts() only runs when there are held transcripts.
Description
Changes Made
Pre-Review Checklist
Testing
restaurant_agent.tsandrealtime_agent.tswork properly (for major changes)Additional Notes
Note to reviewers: Please ensure the pre-review checklist is completed before starting your review.
Summary by CodeRabbit
Release Notes v1.0.38
New Features
Improvements
Breaking Changes
✏️ Tip: You can customize this high-level summary in your review settings.