From ac77ce6996e192f6a9d6921a00910ff6401bdc59 Mon Sep 17 00:00:00 2001 From: stack72 Date: Sat, 21 Mar 2026 00:41:36 +0000 Subject: [PATCH] fix: use sonnet for skill trigger evals instead of haiku MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Summary - Remove `EVAL_MODEL: "claude-haiku-4-5-20251001"` override from the skill-trigger-eval CI job. Skill descriptions were tuned against Sonnet, and Haiku doesn't follow routing instructions (like "use this skill INSTEAD OF domain-specific skills") reliably enough — swamp-workflow dropped to 63% and swamp-troubleshooting to 65% on Haiku vs 80%+ on Sonnet. - Keep `EVAL_RUNS=1` and 25 concurrent workers for speed. 185 Sonnet calls dispatched in parallel is still fast. ## Test plan - [x] Haiku results: 2 skills failing (workflow 63%, troubleshooting 65%) - [x] Sonnet results: all skills passing (≥80%) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- .github/workflows/ci.yml | 1 - 1 file changed, 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index e6209f96..d48aaf20 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -137,7 +137,6 @@ jobs: EVAL_RUNS: "1" EVAL_WORKERS: "25" EVAL_TIMEOUT: "30" - EVAL_MODEL: "claude-haiku-4-5-20251001" run: deno run eval-skill-triggers claude-review: