feat: add agent coordination hooks to prevent duplicate work by geoffjay · Pull Request #703 · geoffjay/agentd

geoffjay · 2026-03-22T19:06:48Z

feat: add agent coordination hooks to prevent duplicate work

feat(hooks): add agent coordination hook to prevent duplicate work

Add .claude/hooks/check-coordination.py, a PreToolUse hook that intercepts
gh issue create, gh issue close/reopen, and agent communicate announcements
commands to enforce coordination between concurrent agents.

The hook:

Checks the engineering communicate room for active [LOCK] messages before
allowing issue creation or closure (blocks if a lock is active)
Performs a fuzzy duplicate search (via gh issue list --search) before any
gh issue create and injects a warning if similar open issues exist
Warns before posting to the announcements room if a recent message was
already posted on the same topic
Fails open in < 2 s if the communicate service is unreachable

Add the hook to .claude/settings.json as a PreToolUse/Bash hook with an 8 s
timeout.

Add lock/unlock protocol documentation to the system prompts of the four
agents most likely to perform bulk mutations: planner, worker, conductor, and
triage. Each prompt now explains when to acquire a [LOCK], how to release it
with [UNLOCK], and that the hook enforces these signals automatically.

Closes #657

Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com

geoffjay · 2026-03-22T19:06:49Z

This change is part of the following stack:

feat: add agent coordination hooks to prevent duplicate work #703 ◀

_{Change managed by git-spice.}

geoffjay

Review: feat(hooks): agent coordination checkpoint

Good concept and solid overall structure — this directly addresses the v0.9.0 duplication incident where 3 agents independently created 52 overlapping issues. The fail-open design, 8 s timeout, and lock/unlock protocol are all the right instincts. The agent YAML additions are clear and consistent.

Two blocking bugs need fixing before merge, plus a few non-blocking suggestions below.

🔴 Blocking: f-string missing on warn message in `handle_issue_close`

In handle_issue_close (check-coordination.py ~line 282), the first string literal in the warn() call has an f prefix but the continuation strings do not. Python does not automatically promote adjacent string literals to f-strings — only the first segment is interpolated. The second and third strings will render literally as #{kept_number} and #{issue_number} in the agent's context, making the warning completely useless.

Fix — add f to every continuation string:

warn(
    f"⚠️  Closing issue #{issue_number} as duplicate of #{kept_number}.\n\n"
    f"Before proceeding, verify that the issue being KEPT (#{kept_number}) "
    f"is at least as detailed as the one being closed (#{issue_number}). "
    "If the issue you are closing is more detailed, consider merging its "
    "content into the kept issue first."
)

🔴 Blocking: `check_active_lock` doesn't handle concurrent locks correctly

check_active_lock (check-coordination.py ~line 197) tracks a single active_lock variable. If Agent A posts LOCK, then Agent B posts LOCK, then Agent A posts UNLOCK — the function clears active_lock entirely (because any UNLOCK clears the single slot), even though Agent B's lock is still live. A third agent running gh issue create will see no active lock and proceed, violating Agent B's lock.

This is exactly the scenario that can arise during a busy pipeline run with planner and conductor both doing bulk ops simultaneously — the very situation this PR is meant to prevent.

Minimal fix — track per-agent locks by keying on the actor/sender field:

def check_active_lock(messages: list[dict]) -> str | None:
    active_locks: dict[str, str] = {}  # actor -> lock message
    for msg in messages:
        content = msg.get("content") or msg.get("body") or msg.get("text") or ""
        actor = msg.get("sender") or msg.get("actor") or "unknown"
        if LOCK_MARKER_RE.search(content):
            active_locks[actor] = content.strip()
        elif UNLOCK_MARKER_RE.search(content):
            active_locks.pop(actor, None)
    # Return the first still-active lock, if any
    return next(iter(active_locks.values()), None)

This assumes each actor uses consistent sender metadata, which the communicate service provides. If actor identity in messages is unreliable, a counter-based approach (increment on LOCK, decrement on UNLOCK, block when > 0) is a safe fallback.

🟡 Non-blocking suggestions

1. gh issue reopen routed to handle_issue_close (check-coordination.py, routing in main())
The duplicate-check logic in handle_issue_close (looking for closes #N, duplicate of, etc.) is only meaningful for close operations. Routing reopen through the same handler runs that check against a command that will never contain those patterns — dead code that could confuse future maintainers. Consider a separate handle_issue_reopen that only does the lock check and then calls allow().

2. No unit tests
392 lines of Python with non-trivial branching and no unit tests is a gap, especially for a safety mechanism. At minimum check_active_lock, _extract_title, and the main() routing logic are worth covering. A tests/test_check_coordination.py with a few pytest cases would give confidence that the lock semantics are correct across concurrent scenarios.

3. get_room_id fetches only 100 rooms
_http_get("/rooms?limit=100&offset=0") — if the communicate service has more than 100 rooms this silently fails to find engineering and the hook fails open without any indication. A name-based query endpoint (if available) or a larger limit would be more robust.

4. Lock window undocumented as design decision
The 30-minute LOCK_WINDOW_MINUTES means a lock placed before a long conductor sync (> 30 min) will silently expire. This is probably intentional (stale locks shouldn't block forever) but it is not called out in the module docstring. A one-liner explaining the trade-off would help the next person who hits a mysterious bypass.

5. Unnecessary f prefix (trivial)
_http_get(f"/rooms?limit=100&offset=0") — no interpolation, the f is a no-op.

Stack note: This PR stacks directly on feature/autonomous-pipeline — no parent open PRs to worry about.

Blocking fixes: - Fix check_active_lock() to track locks per-actor using a dict keyed on the message sender field. Previously a single active_lock variable meant Agent A's [UNLOCK] would clear Agent B's still-active lock, allowing a third agent to bypass B's hold. Now each actor's lock/unlock pair is tracked independently. - Fix f-string missing on continuation strings in handle_issue_close(): the kept_number and issue_number variables were rendered literally as #{kept_number} / #{issue_number} in the warn() message. Added f prefix to all strings that reference those variables. Non-blocking fixes: - Add handle_issue_reopen() — a dedicated handler that runs only the lock check (no duplicate-detection logic). Update main() routing so gh issue reopen dispatches there instead of handle_issue_close(). - Remove spurious f prefix from _http_get("/rooms?limit=100&offset=0") (no interpolation was happening). - Document the 30-minute LOCK_WINDOW_MINUTES trade-off in the module docstring: stale locks expire intentionally, agents doing extended work should post periodic [LOCK] refreshes. - Add .claude/tests/test_check_coordination.py with 17 unittest cases covering check_active_lock() (concurrent locks, case insensitivity, ordering), _extract_title(), and main() dispatch routing. Runnable with stdlib unittest (no pytest required).

geoffjay added the review-agent Used to invoke a review by an agent tracking this label label Mar 22, 2026

geoffjay mentioned this pull request Mar 22, 2026

Add agent coordination hooks to prevent duplicate work and conflicting actions #657

Closed

8 tasks

geoffjay commented Mar 23, 2026

View reviewed changes

geoffjay added the needs-rework PR has review feedback that must be addressed before merging label Mar 23, 2026

geoffjay added 2 commits March 22, 2026 20:05

feat: add agent coordination hooks to prevent duplicate work

be8401b

geoffjay force-pushed the issue-657 branch from a45faeb to 8c347c7 Compare March 23, 2026 03:05

geoffjay merged commit 6eaa65b into feature/autonomous-pipeline Mar 23, 2026

geoffjay deleted the issue-657 branch March 23, 2026 03:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add agent coordination hooks to prevent duplicate work#703

feat: add agent coordination hooks to prevent duplicate work#703
geoffjay merged 2 commits intofeature/autonomous-pipelinefrom
issue-657

geoffjay commented Mar 22, 2026

Uh oh!

geoffjay commented Mar 22, 2026 •

edited

Loading

Uh oh!

geoffjay left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

geoffjay commented Mar 22, 2026

Uh oh!

geoffjay commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geoffjay left a comment

Choose a reason for hiding this comment

Review: feat(hooks): agent coordination checkpoint

🔴 Blocking: f-string missing on warn message in handle_issue_close

🔴 Blocking: check_active_lock doesn't handle concurrent locks correctly

🟡 Non-blocking suggestions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

geoffjay commented Mar 22, 2026 •

edited

Loading

🔴 Blocking: f-string missing on warn message in `handle_issue_close`

🔴 Blocking: `check_active_lock` doesn't handle concurrent locks correctly