🤖 feat: add actor-critic mode loop and source-aware rendering#2471
🤖 feat: add actor-critic mode loop and source-aware rendering#2471ammar-agent wants to merge 28 commits intomainfrom
Conversation
|
@codex review |
|
@codex review Addressed the failing integration check by switching the new critic UI test from a |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f91158db9a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review Addressed queue-priority feedback:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 70929b8143
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review Aligned
|
|
@codex review Addressed the edit-precedence thread:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 78e85797d2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review Addressed the critic-resume thread:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 84fd2e8d31
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review Addressed both new threads:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 03461a1949
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review Addressed the /done parity thread:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: edb08de807
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review Addressed both latest threads:
|
|
Codex Review: Didn't find any major issues. Breezy! ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
Implement actor-critic mode across command parsing, send options, backend stream orchestration, and UI rendering. - Add /critic toggle command and workspace-scoped critic prompt persistence. - Add backend actor↔critic automatic turn loop with strict /done termination and critic-only tool disable policy. - Persist actor/critic message source metadata and transform request history for critic role-flip + actor feedback. - Render critic reasoning/assistant output with dedicated source badges and styling. - Expand mock router/player plumbing and add comprehensive UI coverage for looping, /done semantics, context recovery, and interrupt behavior. Validation: - make static-check - bun test ./tests/ui/critic/criticMode.test.ts - bun test ./src/browser/utils/slashCommands/parser.test.ts ./src/browser/utils/slashCommands/suggestions.test.ts ./src/browser/utils/messages/sendOptions.test.ts --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$19.73`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=19.73 -->
The integration CI suite runs tests with Jest, not bun:test. Replace the bun:test import in the new critic-mode UI suite so the file uses the same Jest global pattern as other `tests/ui/*.test.ts` files. Validation: - env TEST_INTEGRATION=1 bun x jest --maxWorkers=100% --silent tests/ui/critic/criticMode.test.ts - make static-check --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$19.73`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=19.73 -->
When a user sends a follow-up during a critic-streaming turn, the queued message should be processed before the actor-critic loop starts another automatic continuation turn. - Stop actor-critic auto-continuation when queued user input exists at stream end. - Clear critic loop state in that branch so the queued user turn can own continuation semantics. - Add a UI integration test that queues user input during critic streaming and asserts the queued turn executes before auto-continuation. Validation: - bun test ./tests/ui/critic/criticMode.test.ts - env TEST_INTEGRATION=1 bun x jest --maxWorkers=100% --silent tests/ui/critic/criticMode.test.ts - make static-check --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$19.73`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=19.73 -->
Reduce the mocked critic stream length in the new queue-priority scenario so CI workers can still queue input during streaming without timing out waiting for the critic turn to finish. Validation: - bun test ./tests/ui/critic/criticMode.test.ts - env TEST_INTEGRATION=1 bun x jest --maxWorkers=100% --silent tests/ui/critic/criticMode.test.ts - make static-check --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$19.73`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=19.73 -->
f0d18fd to
00357fb
Compare
In critic mode, the main text input now serves as the critic prompt input. Removes the separate inline critic prompt field and its wrapper row. - Textarea placeholder changes to "Critic instructions..." when critic mode is on - Sending in critic mode uses the message text as the critic prompt (persisted for resumability and override in send options) - Badge kept as a simple standalone indicator - Removed unused setCriticPromptFromUi/findCriticPromptInput test helpers - Updated 3 tests to reflect new UX model (TDD approach) Generated with `mux` · anthropic:claude-opus-4-6 · xhigh · $20.56
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5c87e16b99
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
- Restore the user's critic prompt text when startCriticLoop fails so they don't have to re-type it. - Propagate actual disk read errors instead of silently treating them as empty history, which could bootstrap the wrong turn type.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9c5cbaf304
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
If streamWithHistory fails after seeding the critic prompt as a user message, delete the seeded message so retries still see empty history and take the actor-first path instead of starting a critic turn against a transcript with no actor response.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8da0c9ba08
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
- Don't mark seeded critic prompt as synthetic since it's user-authored content that recovery logic (maybeRetryExecSubagentHardRestart) needs to treat as replayable. - Emit a delete event when rolling back the seeded message on stream failure so the UI removes the ghost message.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1761df42ed
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
When critic mode is active and the user clicks send without entering any text (e.g., with only attachments), show an explicit error toast instead of silently doing nothing.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 63024e6710
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Extract experiment flag resolution into a shared helper so startCriticLoop runs with the same resolved experiment flags as sendMessage. This prevents critic loops from diverging on experiment assignments when remote evaluation is enabled.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1c5f4d5254
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
When startCriticLoop is called with non-empty history that ends in a user message (no actor response yet), start an actor turn first so the pending request gets answered before the critic evaluates.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bf3ae42f03
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
When editingMessage is active, fall through to the normal edit-send flow instead of routing to startCriticLoop. This prevents message edits from being intercepted by critic mode.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 873e7b3d2c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Mirror sendMessage's pre-send bookkeeping so critic-mode interactions update workspace ordering and persist model/thinking preferences.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 57a783e75f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Re: Re: one-shot model overrides — in critic mode the textarea is for critic instructions. Model one-shot prefixes like Resolving both as by-design. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 57a783e75f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 57a783e75f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Re: file attachments in critic mode — this is a feature gap in the broader critic mode implementation, not a regression from the empty-history change. Critic mode was designed as a text-prompt-only flow in v1. File attachment support would be a separate enhancement. Resolving as out-of-scope for this change. |
Summary
Adds an actor-critic mode loop where a critic LLM automatically reviews each actor turn and provides feedback until it signals
/done. The main text input becomes the critic prompt in critic mode — no separate inline input field.Background
Actor-critic is a workflow pattern where a "critic" LLM evaluates the "actor" LLM's output after each turn, providing iterative feedback. This enables autonomous self-improvement loops where the actor refines its work based on critic feedback until the critic is satisfied.
Implementation
Critic mode UX
/criticslash command toggles critic mode per workspace (persisted in localStorage)Actor-critic loop orchestration (
agentSession.ts)criticEnabled, a critic turn auto-firesbuildCriticRequestHistory), disables all tools, and injects critic-specific system instructions/doneto stop the loopCritic message building (
criticMessageBuilder.ts)buildCriticRequestHistory— role-flips actor ↔ user messages for critic perspectivebuildActorRequestHistoryWithCriticFeedback— transforms critic feedback into user-role messages for actor context, filtering out partial and/donemessagesisCriticDoneResponse— detects/donesentinel, allowing reasoning parts but checking only visible textbuildCriticAdditionalInstructions— assembles critic system prompt from base instructions + user critic promptSource-aware rendering
metadata.messageSource = "actor" | "critic"data-message-sourceattribute on assistant/reasoning message DOM elementsmessageSourcethreaded through stream events (stream-start,reasoning-delta,stream-end)Resilience
shouldResumeAsCriticTurninspects latest partial message metadataValidation
/donesemantics, role-flip, reasoning persistence, context-exceeded recovery, model parity, queue priority, interrupt, disabled baselineisCriticDoneResponseandbuildActorRequestHistoryWithCriticFeedbackmake static-checkpassesGenerated with
mux• Model:anthropic:claude-opus-4-6• Thinking:xhigh• Cost:$20.56