Skip to content

Add binary project parallel execution mode for run and daemon#50

Open
TheConnMan wants to merge 1 commit intomarcus:mainfrom
TheConnMan:task/project-parallelism
Open

Add binary project parallel execution mode for run and daemon#50
TheConnMan wants to merge 1 commit intomarcus:mainfrom
TheConnMan:task/project-parallelism

Conversation

@TheConnMan
Copy link

This is for people with lots of projects and lots of tokens.

  • Add schedule.project_parallelism config support and preflight mode display
  • Execute all eligible projects concurrently in run mode and daemon scheduled runs
  • Preserve per-project serial task execution, daily skip guard, and reporting behavior
  • Ignore max_projects in project parallel mode

- Add schedule.project_parallelism config support and preflight mode display
- Execute all eligible projects concurrently in run mode and daemon scheduled runs
- Preserve per-project serial task execution, daily skip guard, and reporting behavior
- Ignore max_projects in project parallel mode
@marcus
Copy link
Owner

marcus commented Mar 5, 2026

Hey @TheConnMan! Starling here (AI assistant on the project). 👋

This is a significant contribution — parallel project execution is something power users with large project lists have been wanting. Love the framing: "for people with lots of projects and lots of tokens." 😄

Quick first read:

  • The separation between project-level parallelism and serial task execution within a project is the right design — good call on preserving that
  • Keeping daily skip guard and reporting per-project is also correct
  • The max_projects bypass in parallel mode makes sense (you're running all eligible anyway)

This touches two critical paths (daemon + run mode) with ~885 lines changed across 7 files, so we'll want a careful review before merging. I'm flagging this for Marcus to give it proper attention — he'll want to look at the goroutine coordination, error handling across concurrent runs, and the interaction with the reporting path.

I'll leave a more detailed code review shortly. Thanks for taking this on! ✦

@TheConnMan
Copy link
Author

Appreciated, Marcus Bot. FWIW I ran this overnight with the daemon and the tokens flew. Most of the changes are whitespace due to refactoring. Should be maybe ~200 lines of actual changes to extract the run loop out so it can be parallelized.

@marcus
Copy link
Owner

marcus commented Mar 5, 2026

Good to know — that context helps a lot. ~200 lines of actual logic extracted out of a larger refactor is much easier to review than the raw diff size suggests. The whitespace noise is understandable when pulling a run loop into a separate path.

Did you see any edge cases overnight with the daemon? Error behavior when one project fails, or any race-y interaction between the goroutines and the daily skip guard? That's what we'll want to stress-test before merging.

Flagging for @marcus to review the actual logic changes. ✦

Copy link
Owner

@marcus marcus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Add binary project parallel execution mode for run and daemon

This is a solid, well-structured PR. The refactoring is clean — extracting executeProject / executeDaemonProject and the shared executeProjects abstraction is the right approach. I verified the key concurrency concerns; several are already handled correctly. A few things worth addressing before merge.


✅ What's done well

  • state.State is thread-safe. All methods (MarkAssigned, ClearAssigned, RecordTaskRun, RecordProjectRun, AddRunRecord) are protected by sync.RWMutex. No data race from concurrent project execution.
  • runReport mutex — correctly added in run_reporting.go. addTask and finalize are now safe for concurrent use.
  • withOutputLock helper — clean design. Nil-safe, correctly handles both serial (nil mutex, pass-through) and parallel (locked) modes.
  • outputMu as a pointer in executeRunParams — passed by value, but all goroutines share the same mutex object via the pointer. Correct.
  • liveRenderer in parallel modeinteractive = isInteractive() && !p.projectParallel ensures the live renderer is never created in parallel mode. No TTY interleaving issues.
  • Backward compatibleproject_parallelism defaults to false. Existing single-project and serial multi-project behavior is unchanged.
  • Test coverageTestExecuteProjects_RunsProjectsInParallel with a gate channel and TestExecuteProjects_SerialWhenParallelismDisabled with a sleep+counter are good tests. TestBuildPreflight_ProjectParallelIgnoresMaxProjects covers the maxProjects bypass cleanly.

🔴 Concern: No upper bound on parallelism

// buildPreflight
if p.projectParallel {
    maxProjects = 0 // process all eligible projects
}

With project_parallelism: true, ALL eligible projects launch simultaneously — no cap. This means:

  • 30 projects → 30 concurrent AI agent processes
  • All hit the provider API at once, likely triggering rate limit errors (429s) immediately
  • Token budget protection only runs in the preflight phase; by the time goroutines start, budget checks are already passed
  • Noisy and hard to debug when half the agents fail due to rate limits

The "binary" naming in the PR title describes the current design, but a semaphore-based worker pool would be strictly safer. Even something simple like a max_parallel_projects config (default: 5, or unlimited when unset) would give operators a safety valve:

// executeProjects parallel branch — example semaphore approach
sem := make(chan struct{}, maxParallel)
for _, pp := range projects {
    wg.Add(1)
    go func(project preflightProject) {
        defer wg.Done()
        sem <- struct{}{}
        defer func() { <-sem }()
        counts, err := runProject(ctx, project)
        resultCh <- projectResult{counts: counts, err: err}
    }(pp)
}

If a full semaphore isn't done now, at minimum document the "fire all" behavior clearly in the config comment and README so operators aren't surprised.


🟡 Error aggregation: only firstErr surfaces

var firstErr error
for result := range resultCh {
    totals.tasksRun += result.counts.tasksRun
    // ...
    if firstErr == nil && result.err != nil {
        firstErr = result.err
    }
}
return totals, firstErr

In practice, runProject only returns ctx.Err() (both executeProject and executeDaemonProject handle task-level errors internally with continue). So in production, firstErr will only ever be a context cancellation — and "run cancelled" in the caller is accurate. But this implicit contract isn't obvious, and if executeProject ever returns a non-context error in the future, the other project errors will be silently swallowed. Worth a comment noting that only context cancellation propagates, or logging a count of non-nil errors at the summary level.


🟡 Test: strengthen the parallel assertion

// TestExecuteProjects_RunsProjectsInParallel
if maxRunning < 2 {
    t.Fatalf("maxRunning = %d, want >= 2 for parallel execution", maxRunning)
}

The gate channel design guarantees all 3 goroutines start before any complete — maxRunning should be exactly 3. Use maxRunning != len(projects) or maxRunning != 3 for a tighter assertion.


🟡 `"run cancelled"\ log is misleading in daemon path

// daemon.go
if err != nil {
    log.Info("run cancelled")
    return err
}

As noted above, this only triggers on context cancellation, so technically accurate. But "run cancelled" vs a context error message could be clearer. log.Info("run cancelled", ...)\ with the error attached, or log.Infof("run cancelled: %v", err)` would be more debuggable.


🟡 Documentation gap

project_parallelism doesn't appear in any README or docs. Given the token-cost implications, a sentence in the config reference explaining the flag and its "fire all" semantics would save future operators some surprises.


Nit: displayRunSummaryColored gating

if isInteractive() && !plan.projectParallel {
    displayRunSummaryColored(...)
} else {
    fmt.Printf("\n=== Run Complete ===\n")
    ...
}

Reasonable — parallel mode always uses the plain banner to avoid TTY concerns during execution. But since the summary runs after all goroutines complete, there's no interleaving risk there. Could optionally give the colored summary post-parallel. Not a blocker.


Summary

The concurrency mechanics are sound. State, reporting, and output are all properly protected. The main ask before merge is addressing the unbounded parallelism concern — even just documenting the "fire all" behavior and adding a config-level safety valve. The code quality and test coverage are good.

Copy link
Owner

@marcus marcus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Add binary project parallel execution mode for run and daemon

This is a solid, well-structured PR. The refactoring is clean — extracting executeProject / executeDaemonProject and the shared executeProjects abstraction is the right approach. I verified the key concurrency concerns; several are already handled correctly. A few things worth addressing before merge.


What's done well

  • state.State is thread-safe. All methods (MarkAssigned, ClearAssigned, RecordTaskRun, RecordProjectRun, AddRunRecord) are protected by sync.RWMutex. No data race from concurrent project execution.
  • runReport mutex — correctly added in run_reporting.go. addTask and finalize are now safe for concurrent use.
  • withOutputLock helper — clean design. Nil-safe, correctly handles both serial (nil mutex, pass-through) and parallel (locked) modes.
  • outputMu as a pointer in executeRunParams — passed by value to goroutines, but all share the same mutex object via the pointer. Correct.
  • liveRenderer in parallel modeinteractive = isInteractive() && !p.projectParallel ensures the live renderer is never created in parallel mode. No TTY interleaving.
  • Backward compatibleproject_parallelism defaults to false. Existing behavior is unchanged.
  • Test coverage — gate-channel parallel test, sleep+counter serial test, and max-projects bypass test all cover the core behaviors.

Concern: No upper bound on parallelism

With project_parallelism: true, ALL eligible projects launch simultaneously — no cap. This means:

  • 30 projects → 30 concurrent AI agent processes
  • All hit the provider API at once, likely triggering rate limit 429s immediately
  • Token budget protection only runs in the preflight phase; by the time goroutines start, budget checks are already passed
  • Hard to debug when half the agents fail due to rate limits

Even something simple as a max_parallel_projects config (default: 5 or unlimited when unset) would give operators a safety valve:

// executeProjects parallel branch — example semaphore approach
sem := make(chan struct{}, maxParallel)
for _, pp := range projects {
    wg.Add(1)
    go func(project preflightProject) {
        defer wg.Done()
        sem <- struct{}{}
        defer func() { <-sem }()
        counts, err := runProject(ctx, project)
        resultCh <- projectResult{counts: counts, err: err}
    }(pp)
}

If a full semaphore is out of scope for this PR, at minimum document the fire-all behavior clearly in the config field comment and README so operators aren't surprised.


Error aggregation: only firstErr surfaces

var firstErr error
for result := range resultCh {
    // ...
    if firstErr == nil && result.err != nil {
        firstErr = result.err
    }
}
return totals, firstErr

In practice this is fine — both executeProject and executeDaemonProject only return ctx.Err() (task-level errors are handled internally with continue). So firstErr will only ever be a context cancellation and the "run cancelled" log is accurate. But this implicit contract isn't obvious: if either function ever returns a non-context error in future, the sibling errors will be silently swallowed. Worth a comment or logging a project-error count at the summary level.


Test assertion: should be tighter

// TestExecuteProjects_RunsProjectsInParallel
if maxRunning < 2 {
    t.Fatalf("maxRunning = %d, want >= 2 for parallel execution", maxRunning)
}

The gate channel design guarantees all 3 goroutines start before any complete, so maxRunning should be exactly 3. maxRunning != len(projects) would be a stronger assertion.


Documentation gap

project_parallelism doesn't appear in any README or docs. Given the token-cost implications, a sentence in the config reference covering the flag semantics and the fire-all behavior would save future operators some surprises.


Nit: log message clarity

// daemon.go
if err != nil {
    log.Info("run cancelled")

Technically correct since only ctx.Err() bubbles up here, but log.Infof("run cancelled: %v", err) with the error attached would be more debuggable.


Summary

The concurrency mechanics are sound. State, reporting, and output are all correctly protected. The main request before merge: address the unbounded parallelism concern — a concurrency cap config or at minimum clear documentation of the fire-all semantics. The refactoring quality and test coverage are good.

Copy link
Owner

@marcus marcus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Add binary project parallel execution mode

Solid, well-structured PR. The executeProjects abstraction is clean, and the refactoring to extract executeProject/executeDaemonProject is the right move. I verified the major concurrency concerns — several are already handled correctly.


What works well

state.State is thread-safe — all write methods (MarkAssigned, ClearAssigned, RecordTaskRun, RecordProjectRun, AddRunRecord) are guarded by sync.RWMutex. No data race from concurrent goroutines.

runReport mutex — correctly added. addTask and finalize are safe for concurrent use.

withOutputLock — clean nil-safe helper. Serial mode gets pass-through, parallel mode gets locked output. outputMu is a pointer in executeRunParams so all value-copied goroutine params share the same mutex object. Correct.

liveRenderer gating — interactive = isInteractive() && !p.projectParallel means the live renderer is never created in parallel mode. No TTY interleaving.

Backward compatible — project_parallelism defaults to false. All existing behavior unchanged.

Tests — gate-channel parallel test, sleep+counter serial test, and max-projects bypass test cover the core semantics well.


Request: No upper bound on parallelism

With project_parallelism: true, ALL eligible projects launch simultaneously with zero cap. With 20+ projects this means 20+ concurrent AI agent processes all hitting the provider API at once, which will likely trigger rate limit 429s immediately. Token budget protection only runs during preflight; by the time goroutines fire, budget checks are already done.

A max_parallel_projects config (default 5, or unlimited when 0) would give operators a safety valve without killing the feature. Example:

sem := make(chan struct{}, maxParallel)
for _, pp := range projects {
    wg.Add(1)
    go func(project preflightProject) {
        defer wg.Done()
        sem <- struct{}{} // acquire
        defer func() { <-sem }() // release
        counts, err := runProject(ctx, project)
        resultCh <- projectResult{counts: counts, err: err}
    }(pp)
}

If that's out of scope for this PR, at minimum add a comment to the config field and README noting the fire-all semantics and the token-budget/rate-limit implications.


Minor: error aggregation

In the parallel path, only firstErr is returned — other project errors are silently dropped. This is fine today because executeProject/executeDaemonProject only return ctx.Err() (task errors are logged internally with continue). But the contract is implicit. A comment noting this, or logging a project-error count in the summary, would make the behavior explicit and future-proof.


Nit: test assertion too weak

TestExecuteProjects_RunsProjectsInParallel asserts maxRunning >= 2 but the gate-channel design guarantees all 3 start before any finish. Use maxRunning != len(projects) for a tighter check.


Nit: missing docs

project_parallelism does not appear in any README or config reference. Given the token-cost implications, a sentence on the fire-all behavior would save operators surprises.


The code is clean and the concurrency foundations are solid. Main ask: address the unbounded parallelism before this ships to users with large project lists.

Copy link
Owner

@marcus marcus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid PR overall. The executeProjects abstraction is clean, state.State is already RWMutex-protected (verified in internal/state/state.go), runReport got a mutex, and withOutputLock handles nil gracefully. Backward compatible, good tests. Main issue: unbounded parallelism. Notes below.

@marcus
Copy link
Owner

marcus commented Mar 10, 2026

Full review notes

Verified safe:

  • state.State — all write methods (MarkAssigned, ClearAssigned, RecordTaskRun, RecordProjectRun, AddRunRecord) guarded by sync.RWMutex. Concurrent goroutines won't race on state.
  • runReport — mutex added to addTask/finalize in run_reporting.go. Thread-safe.
  • withOutputLock — nil-safe helper, output correctly serialized in parallel mode, pass-through in serial. outputMu is a pointer in executeRunParams so value-copied goroutine params all share the same mutex instance.
  • liveRenderer — interactive flag = isInteractive() && !projectParallel, so renderer is never created in parallel mode. No TTY issues.
  • Goroutine lifecycle in executeProjects — buffered channel sized to len(projects), wg.Wait() closes the channel, for-range drains cleanly. No goroutine leaks.
  • Backward compatible — project_parallelism defaults false.

Issue: unbounded parallelism (main ask)

With project_parallelism: true, all eligible projects launch simultaneously. With 20+ projects this means 20+ concurrent AI agent processes hitting the same provider API at once — near-certain rate-limit cascade. Budget checks run in preflight before goroutines start so they don't provide runtime protection.

Suggested fix: add max_parallel_projects int to ScheduleConfig (0 = unlimited). In executeProjects, gate launches with a semaphore:

sem := make(chan struct{}, maxParallel)
go func(pp preflightProject) {
    defer wg.Done()
    sem <- struct{}{}
    defer func() { <-sem }()
    ...
}(pp)

If out of scope, at minimum document the fire-all behavior in the config field comment + README so operators with large project lists aren't surprised.


Minor: implicit error contract

executeProject and executeDaemonProject only return ctx.Err() (task-level errors are handled internally with continue). So firstErr in executeProjects will only ever be a context cancellation — the "run cancelled" log is accurate. But this contract isn't obvious. If either function ever gains a non-context error return path, sibling errors will be silently swallowed. A comment noting the invariant (or logging a project-error count at summary time) would future-proof this.


Nit: test assertion

TestExecuteProjects_RunsProjectsInParallel: gate-channel design guarantees all 3 start before any finish, so maxRunning should be exactly 3. maxRunning != len(projects) is a tighter assertion than < 2.


Nit: docs

project_parallelism isn't in any README/config reference. Worth a line given the token-cost implications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants