Conversation
…ia clock in webrtc. Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
…dio. Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
j0sh
left a comment
There was a problem hiding this comment.
Thanks, this does seem somewhat simpler than the last iteration.
I don't want to block this for the sake of shipping something if it seems to be working alright for now, but there are a couple things I don't quite understand.
- Video and audio are being paced differently. Video effectively uses wall-clock while audio is using sample counts. Is there a reason for this?
- Using wall-clock makes things susceptible to pipeline jitter.
- If audio output is a little delayed, it doesn't get a chance for another 20ms. This accumulates and will lead to desync. Conversely, audio that might be a little bursty can be unnecessarily delayed.
- Is there a reason the audio queue is non-blocking? It seems preferable to block (up to a reasonable duration, then silence can be inserted) instead of sleeping. But maybe I'm missing something about why media pulls are intended to be non-blocking.
- In general, I'd consider using input timestamps or a "reference clock" to timestamp and pace the output, rather depending on wall-clock after the pipeline. Pipelines may produce output at different rates and this architecture generally doesn't account for that. More on this in Discord.
|
Thank you @j0sh, let's continue in Discord. |
src/scope/server/tracks.py
Outdated
| # Interleave into buffer: [L0, R0, L1, R1, ...] | ||
| for i in range(audio_np.shape[1]): | ||
| for ch in range(self.channels): | ||
| self._audio_buffer.append(audio_np[ch, i]) | ||
|
|
||
| # Serve a 20ms frame from the buffer | ||
| samples_needed = self._samples_per_frame * self.channels | ||
| if len(self._audio_buffer) >= samples_needed: | ||
| samples = [self._audio_buffer.popleft() for _ in range(samples_needed)] |
There was a problem hiding this comment.
Do these loop over individual samples? That is probably pretty expensive; best to work on complete 20ms frames if possible. There's probably some numpy wizardry for fast interleaving and effective frame chunking.
🚀 fal.ai Preview Deployment
TestingConnect to this preview deployment by setting the fal endpoint in your client: 🧪 E2E tests will run automatically against this deployment. |
✅ E2E Tests passed
Test ArtifactsCheck the workflow run for screenshots. |
…ioProcessingTrack Addresses review feedback on #534. The audio buffer interleaving and frame extraction used O(n) Python loops over individual samples, which is expensive for real-time audio. Now uses np.ravel(order="F") for interleaving and numpy slicing for frame extraction. Also adds 42 tests covering interleaving, buffering, resampling, channel conversion, frame construction, and adversarial inputs. Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>
…perf Fix AudioProcessingTrack per-sample loop performance
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
Audio Support for Scope (Reworked)
Overall, this approach is simplified and seems to solve audio quality issues. The complex gating mechanics and resampling have been removed, leveraging WebRTC aiortc-style timing.
Summary
Adds end-to-end audio support to Scope's WebRTC streaming pipeline. Pipelines can return audio alongside video in their output dict; the server streams audio over WebRTC. This is a simplified rewrite of the audio path that fixes clipping and audio quality issues reported in PR #480.
What's New
Backend
{"video": ..., "audio": ..., "audio_sample_rate": ...}. Audio keys are optional; pipelines that don't produce audio are unchanged.audio_callbackinstead of a queue. Only the last processor in a chain receives the callback.audio_queuefor raw(audio_tensor, sample_rate)tuples. No background drain thread, no video-gated release, no resampling.Frontend
MediaStream. Adds a recvonly audio transceiver so the SDP offer includes an audio m-line for the backend to attach its track.WebRTC Handshake
The browser adds
addTransceiver("audio", { direction: "recvonly" })so the offer includes an audio m-line. AftersetRemoteDescription, the backend finds the audio transceiver, attaches itsAudioProcessingTrack, and sets direction tosendonly. The answer then indicates that the server will send audio.Why This Version (audio-sync-2)
PR #480 received feedback about:
This branch is a simplified rewrite that:
Architecture
Trade-offs
Files Changed
src/scope/server/frame_processor.py– Simplified audio path (~185 net lines removed)src/scope/server/pipeline_processor.py– Callback-based audio deliverysrc/scope/server/tracks.py– Stereo AudioProcessingTrack with per-channel resamplingsrc/scope/server/webrtc.py– Audio track wiring (no MediaClock)Related