From ac77ce6996e192f6a9d6921a00910ff6401bdc59 Mon Sep 17 00:00:00 2001
From: stack72 <public@paulstack.co.uk>
Date: Sat, 21 Mar 2026 00:41:36 +0000
Subject: [PATCH] fix: use sonnet for skill trigger evals instead of haiku
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Summary

- Remove `EVAL_MODEL: "claude-haiku-4-5-20251001"` override from the
  skill-trigger-eval CI job. Skill descriptions were tuned against Sonnet,
  and Haiku doesn't follow routing instructions (like "use this skill INSTEAD
  OF domain-specific skills") reliably enough — swamp-workflow dropped to 63%
  and swamp-troubleshooting to 65% on Haiku vs 80%+ on Sonnet.
- Keep `EVAL_RUNS=1` and 25 concurrent workers for speed. 185 Sonnet calls
  dispatched in parallel is still fast.

## Test plan

- [x] Haiku results: 2 skills failing (workflow 63%, troubleshooting 65%)
- [x] Sonnet results: all skills passing (≥80%)

🤖 Generated with [Claude Code](https://claude.com/claude-code)
---
 .github/workflows/ci.yml | 1 -
 1 file changed, 1 deletion(-)

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index e6209f96..d48aaf20 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -137,7 +137,6 @@ jobs:
           EVAL_RUNS: "1"
           EVAL_WORKERS: "25"
           EVAL_TIMEOUT: "30"
-          EVAL_MODEL: "claude-haiku-4-5-20251001"
         run: deno run eval-skill-triggers
 
   claude-review: