Add binary project parallel execution mode for run and daemon#50
Add binary project parallel execution mode for run and daemon#50TheConnMan wants to merge 1 commit intomarcus:mainfrom
Conversation
- Add schedule.project_parallelism config support and preflight mode display - Execute all eligible projects concurrently in run mode and daemon scheduled runs - Preserve per-project serial task execution, daily skip guard, and reporting behavior - Ignore max_projects in project parallel mode
|
Hey @TheConnMan! Starling here (AI assistant on the project). 👋 This is a significant contribution — parallel project execution is something power users with large project lists have been wanting. Love the framing: "for people with lots of projects and lots of tokens." 😄 Quick first read:
This touches two critical paths (daemon + run mode) with ~885 lines changed across 7 files, so we'll want a careful review before merging. I'm flagging this for Marcus to give it proper attention — he'll want to look at the goroutine coordination, error handling across concurrent runs, and the interaction with the reporting path. I'll leave a more detailed code review shortly. Thanks for taking this on! ✦ |
|
Appreciated, Marcus Bot. FWIW I ran this overnight with the daemon and the tokens flew. Most of the changes are whitespace due to refactoring. Should be maybe ~200 lines of actual changes to extract the run loop out so it can be parallelized. |
|
Good to know — that context helps a lot. ~200 lines of actual logic extracted out of a larger refactor is much easier to review than the raw diff size suggests. The whitespace noise is understandable when pulling a run loop into a separate path. Did you see any edge cases overnight with the daemon? Error behavior when one project fails, or any race-y interaction between the goroutines and the daily skip guard? That's what we'll want to stress-test before merging. Flagging for @marcus to review the actual logic changes. ✦ |
marcus
left a comment
There was a problem hiding this comment.
Review: Add binary project parallel execution mode for run and daemon
This is a solid, well-structured PR. The refactoring is clean — extracting executeProject / executeDaemonProject and the shared executeProjects abstraction is the right approach. I verified the key concurrency concerns; several are already handled correctly. A few things worth addressing before merge.
✅ What's done well
state.Stateis thread-safe. All methods (MarkAssigned,ClearAssigned,RecordTaskRun,RecordProjectRun,AddRunRecord) are protected bysync.RWMutex. No data race from concurrent project execution.runReportmutex — correctly added inrun_reporting.go.addTaskandfinalizeare now safe for concurrent use.withOutputLockhelper — clean design. Nil-safe, correctly handles both serial (nil mutex, pass-through) and parallel (locked) modes.outputMuas a pointer inexecuteRunParams— passed by value, but all goroutines share the same mutex object via the pointer. Correct.liveRendererin parallel mode —interactive = isInteractive() && !p.projectParallelensures the live renderer is never created in parallel mode. No TTY interleaving issues.- Backward compatible —
project_parallelismdefaults tofalse. Existing single-project and serial multi-project behavior is unchanged. - Test coverage —
TestExecuteProjects_RunsProjectsInParallelwith a gate channel andTestExecuteProjects_SerialWhenParallelismDisabledwith a sleep+counter are good tests.TestBuildPreflight_ProjectParallelIgnoresMaxProjectscovers the maxProjects bypass cleanly.
🔴 Concern: No upper bound on parallelism
// buildPreflight
if p.projectParallel {
maxProjects = 0 // process all eligible projects
}With project_parallelism: true, ALL eligible projects launch simultaneously — no cap. This means:
- 30 projects → 30 concurrent AI agent processes
- All hit the provider API at once, likely triggering rate limit errors (429s) immediately
- Token budget protection only runs in the preflight phase; by the time goroutines start, budget checks are already passed
- Noisy and hard to debug when half the agents fail due to rate limits
The "binary" naming in the PR title describes the current design, but a semaphore-based worker pool would be strictly safer. Even something simple like a max_parallel_projects config (default: 5, or unlimited when unset) would give operators a safety valve:
// executeProjects parallel branch — example semaphore approach
sem := make(chan struct{}, maxParallel)
for _, pp := range projects {
wg.Add(1)
go func(project preflightProject) {
defer wg.Done()
sem <- struct{}{}
defer func() { <-sem }()
counts, err := runProject(ctx, project)
resultCh <- projectResult{counts: counts, err: err}
}(pp)
}If a full semaphore isn't done now, at minimum document the "fire all" behavior clearly in the config comment and README so operators aren't surprised.
🟡 Error aggregation: only firstErr surfaces
var firstErr error
for result := range resultCh {
totals.tasksRun += result.counts.tasksRun
// ...
if firstErr == nil && result.err != nil {
firstErr = result.err
}
}
return totals, firstErrIn practice, runProject only returns ctx.Err() (both executeProject and executeDaemonProject handle task-level errors internally with continue). So in production, firstErr will only ever be a context cancellation — and "run cancelled" in the caller is accurate. But this implicit contract isn't obvious, and if executeProject ever returns a non-context error in the future, the other project errors will be silently swallowed. Worth a comment noting that only context cancellation propagates, or logging a count of non-nil errors at the summary level.
🟡 Test: strengthen the parallel assertion
// TestExecuteProjects_RunsProjectsInParallel
if maxRunning < 2 {
t.Fatalf("maxRunning = %d, want >= 2 for parallel execution", maxRunning)
}The gate channel design guarantees all 3 goroutines start before any complete — maxRunning should be exactly 3. Use maxRunning != len(projects) or maxRunning != 3 for a tighter assertion.
🟡 `"run cancelled"\ log is misleading in daemon path
// daemon.go
if err != nil {
log.Info("run cancelled")
return err
}As noted above, this only triggers on context cancellation, so technically accurate. But "run cancelled" vs a context error message could be clearer. log.Info("run cancelled", ...)\ with the error attached, or log.Infof("run cancelled: %v", err)` would be more debuggable.
🟡 Documentation gap
project_parallelism doesn't appear in any README or docs. Given the token-cost implications, a sentence in the config reference explaining the flag and its "fire all" semantics would save future operators some surprises.
Nit: displayRunSummaryColored gating
if isInteractive() && !plan.projectParallel {
displayRunSummaryColored(...)
} else {
fmt.Printf("\n=== Run Complete ===\n")
...
}Reasonable — parallel mode always uses the plain banner to avoid TTY concerns during execution. But since the summary runs after all goroutines complete, there's no interleaving risk there. Could optionally give the colored summary post-parallel. Not a blocker.
Summary
The concurrency mechanics are sound. State, reporting, and output are all properly protected. The main ask before merge is addressing the unbounded parallelism concern — even just documenting the "fire all" behavior and adding a config-level safety valve. The code quality and test coverage are good.
marcus
left a comment
There was a problem hiding this comment.
Review: Add binary project parallel execution mode for run and daemon
This is a solid, well-structured PR. The refactoring is clean — extracting executeProject / executeDaemonProject and the shared executeProjects abstraction is the right approach. I verified the key concurrency concerns; several are already handled correctly. A few things worth addressing before merge.
What's done well
state.Stateis thread-safe. All methods (MarkAssigned,ClearAssigned,RecordTaskRun,RecordProjectRun,AddRunRecord) are protected bysync.RWMutex. No data race from concurrent project execution.runReportmutex — correctly added inrun_reporting.go.addTaskandfinalizeare now safe for concurrent use.withOutputLockhelper — clean design. Nil-safe, correctly handles both serial (nil mutex, pass-through) and parallel (locked) modes.outputMuas a pointer inexecuteRunParams— passed by value to goroutines, but all share the same mutex object via the pointer. Correct.liveRendererin parallel mode —interactive = isInteractive() && !p.projectParallelensures the live renderer is never created in parallel mode. No TTY interleaving.- Backward compatible —
project_parallelismdefaults tofalse. Existing behavior is unchanged. - Test coverage — gate-channel parallel test, sleep+counter serial test, and max-projects bypass test all cover the core behaviors.
Concern: No upper bound on parallelism
With project_parallelism: true, ALL eligible projects launch simultaneously — no cap. This means:
- 30 projects → 30 concurrent AI agent processes
- All hit the provider API at once, likely triggering rate limit 429s immediately
- Token budget protection only runs in the preflight phase; by the time goroutines start, budget checks are already passed
- Hard to debug when half the agents fail due to rate limits
Even something simple as a max_parallel_projects config (default: 5 or unlimited when unset) would give operators a safety valve:
// executeProjects parallel branch — example semaphore approach
sem := make(chan struct{}, maxParallel)
for _, pp := range projects {
wg.Add(1)
go func(project preflightProject) {
defer wg.Done()
sem <- struct{}{}
defer func() { <-sem }()
counts, err := runProject(ctx, project)
resultCh <- projectResult{counts: counts, err: err}
}(pp)
}If a full semaphore is out of scope for this PR, at minimum document the fire-all behavior clearly in the config field comment and README so operators aren't surprised.
Error aggregation: only firstErr surfaces
var firstErr error
for result := range resultCh {
// ...
if firstErr == nil && result.err != nil {
firstErr = result.err
}
}
return totals, firstErrIn practice this is fine — both executeProject and executeDaemonProject only return ctx.Err() (task-level errors are handled internally with continue). So firstErr will only ever be a context cancellation and the "run cancelled" log is accurate. But this implicit contract isn't obvious: if either function ever returns a non-context error in future, the sibling errors will be silently swallowed. Worth a comment or logging a project-error count at the summary level.
Test assertion: should be tighter
// TestExecuteProjects_RunsProjectsInParallel
if maxRunning < 2 {
t.Fatalf("maxRunning = %d, want >= 2 for parallel execution", maxRunning)
}The gate channel design guarantees all 3 goroutines start before any complete, so maxRunning should be exactly 3. maxRunning != len(projects) would be a stronger assertion.
Documentation gap
project_parallelism doesn't appear in any README or docs. Given the token-cost implications, a sentence in the config reference covering the flag semantics and the fire-all behavior would save future operators some surprises.
Nit: log message clarity
// daemon.go
if err != nil {
log.Info("run cancelled")Technically correct since only ctx.Err() bubbles up here, but log.Infof("run cancelled: %v", err) with the error attached would be more debuggable.
Summary
The concurrency mechanics are sound. State, reporting, and output are all correctly protected. The main request before merge: address the unbounded parallelism concern — a concurrency cap config or at minimum clear documentation of the fire-all semantics. The refactoring quality and test coverage are good.
marcus
left a comment
There was a problem hiding this comment.
Review: Add binary project parallel execution mode
Solid, well-structured PR. The executeProjects abstraction is clean, and the refactoring to extract executeProject/executeDaemonProject is the right move. I verified the major concurrency concerns — several are already handled correctly.
What works well
state.State is thread-safe — all write methods (MarkAssigned, ClearAssigned, RecordTaskRun, RecordProjectRun, AddRunRecord) are guarded by sync.RWMutex. No data race from concurrent goroutines.
runReport mutex — correctly added. addTask and finalize are safe for concurrent use.
withOutputLock — clean nil-safe helper. Serial mode gets pass-through, parallel mode gets locked output. outputMu is a pointer in executeRunParams so all value-copied goroutine params share the same mutex object. Correct.
liveRenderer gating — interactive = isInteractive() && !p.projectParallel means the live renderer is never created in parallel mode. No TTY interleaving.
Backward compatible — project_parallelism defaults to false. All existing behavior unchanged.
Tests — gate-channel parallel test, sleep+counter serial test, and max-projects bypass test cover the core semantics well.
Request: No upper bound on parallelism
With project_parallelism: true, ALL eligible projects launch simultaneously with zero cap. With 20+ projects this means 20+ concurrent AI agent processes all hitting the provider API at once, which will likely trigger rate limit 429s immediately. Token budget protection only runs during preflight; by the time goroutines fire, budget checks are already done.
A max_parallel_projects config (default 5, or unlimited when 0) would give operators a safety valve without killing the feature. Example:
sem := make(chan struct{}, maxParallel)
for _, pp := range projects {
wg.Add(1)
go func(project preflightProject) {
defer wg.Done()
sem <- struct{}{} // acquire
defer func() { <-sem }() // release
counts, err := runProject(ctx, project)
resultCh <- projectResult{counts: counts, err: err}
}(pp)
}If that's out of scope for this PR, at minimum add a comment to the config field and README noting the fire-all semantics and the token-budget/rate-limit implications.
Minor: error aggregation
In the parallel path, only firstErr is returned — other project errors are silently dropped. This is fine today because executeProject/executeDaemonProject only return ctx.Err() (task errors are logged internally with continue). But the contract is implicit. A comment noting this, or logging a project-error count in the summary, would make the behavior explicit and future-proof.
Nit: test assertion too weak
TestExecuteProjects_RunsProjectsInParallel asserts maxRunning >= 2 but the gate-channel design guarantees all 3 start before any finish. Use maxRunning != len(projects) for a tighter check.
Nit: missing docs
project_parallelism does not appear in any README or config reference. Given the token-cost implications, a sentence on the fire-all behavior would save operators surprises.
The code is clean and the concurrency foundations are solid. Main ask: address the unbounded parallelism before this ships to users with large project lists.
marcus
left a comment
There was a problem hiding this comment.
Solid PR overall. The executeProjects abstraction is clean, state.State is already RWMutex-protected (verified in internal/state/state.go), runReport got a mutex, and withOutputLock handles nil gracefully. Backward compatible, good tests. Main issue: unbounded parallelism. Notes below.
Full review notesVerified safe:
Issue: unbounded parallelism (main ask) With project_parallelism: true, all eligible projects launch simultaneously. With 20+ projects this means 20+ concurrent AI agent processes hitting the same provider API at once — near-certain rate-limit cascade. Budget checks run in preflight before goroutines start so they don't provide runtime protection. Suggested fix: add max_parallel_projects int to ScheduleConfig (0 = unlimited). In executeProjects, gate launches with a semaphore: sem := make(chan struct{}, maxParallel)
go func(pp preflightProject) {
defer wg.Done()
sem <- struct{}{}
defer func() { <-sem }()
...
}(pp)If out of scope, at minimum document the fire-all behavior in the config field comment + README so operators with large project lists aren't surprised. Minor: implicit error contract executeProject and executeDaemonProject only return ctx.Err() (task-level errors are handled internally with continue). So firstErr in executeProjects will only ever be a context cancellation — the "run cancelled" log is accurate. But this contract isn't obvious. If either function ever gains a non-context error return path, sibling errors will be silently swallowed. A comment noting the invariant (or logging a project-error count at summary time) would future-proof this. Nit: test assertion TestExecuteProjects_RunsProjectsInParallel: gate-channel design guarantees all 3 start before any finish, so maxRunning should be exactly 3. Nit: docs project_parallelism isn't in any README/config reference. Worth a line given the token-cost implications. |
This is for people with lots of projects and lots of tokens.