fix: release lock before streaming to prevent orphan on client disconnect#234
fix: release lock before streaming to prevent orphan on client disconnect#234beran-t wants to merge 1 commit intoe2b-dev:mainfrom
Conversation
…t disconnect Narrow the lock scope in ContextWebSocket.execute() so that the lock is only held during the prepare+send phase, not during result streaming. Previously, the lock was held for the entire generator lifetime including the _wait_for_result() streaming loop. When a client disconnected (e.g. SDK timeout), Starlette abandoned the generator while it was blocked at `await queue.get()`. The lock stayed held until the kernel finished internally, blocking all subsequent executions on the same context and causing cascading timeouts. The fix moves the streaming phase (Phase B) outside the `async with self._lock` block. This is safe because results are routed by unique message_id in _process_message() — no shared state is accessed during streaming. A try/finally ensures the execution entry is cleaned up even if the generator is abandoned. Fixes e2b-dev#213
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 15ad5ab02e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # Clean up env vars in a separate request after the main code has run | ||
| if env_vars: |
There was a problem hiding this comment.
Serialize env-var cleanup before allowing next execute
_cleanup_task is now created after the lock is released, so a concurrent call can enter execute() first, observe no pending cleanup, and queue its own code while the previous call’s per-request env_vars are still active. In practice, if request A sets env_vars and request B arrives before this assignment runs, B can execute with A’s environment, which is a regression from the old behavior that always waited for cleanup before starting the next execution.
Useful? React with 👍 / 👎.
Fixes #213 — asyncio.Lock in messaging.py not released on client disconnect, causing cascading timeouts. Changes: - Narrow lock scope in ContextWebSocket.execute() to only cover the prepare+send phase (Phase A), releasing it before result streaming (Phase B). This prevents orphaned locks on client disconnect. - Schedule env var cleanup task under the lock (before release) to avoid the race condition flagged in PRs #234/#235. - Add POST /contexts/{id}/interrupt endpoint that calls Jupyter's kernel interrupt API, allowing clients to stop long-running code without restarting the kernel (preserves state). - Add interrupt_code_context/interruptCodeContext to Python and JS SDKs. Co-Authored-By: vasek <vasek.mlejnsky@gmail.com>
Summary
Fixes #213 —
asyncio.Lockinmessaging.pynot released on client disconnect, causing cascading timeouts.ContextWebSocket.execute()so it only covers the prepare+send phase (Phase A), not result streaming (Phase B)try/finallyaround streaming to clean up the_executionsentry even if the generator is abandoneddict.pop()instead ofdelfor defensive cleanupRoot Cause
The lock was held for the entire async generator lifetime, including
_wait_for_result()which blocks onawait queue.get(). When a client disconnects (SDK timeout), Starlette abandons the generator while it's stuck atqueue.get()—aclose()is never called because Starlette only detects disconnects when trying to write. The lock stays held until the kernel finishes internally, blocking all subsequent executions.Why This Fix Is Safe
Streaming results (Phase B) doesn't need the lock because:
message_idin_process_message()— no shared state during streaming_global_env_vars(lazy init) and_cleanup_task(lifecycle management) are only accessed during Phase AReproduction & Verification
Tested on live E2B sandboxes with a custom
code-interpreter-devtemplate built from this branch.Before fix (stock
code-interpretertemplate):After fix (
code-interpreter-devtemplate):Test plan
print('hello')works in <1srun_codecallsenvsparameter works correctlyZeroDivisionErrorreturned properlyconsole.logworks