Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .claude/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,17 @@ npm run validate # Fix all errors before committing. Warnings are acceptable
```
Validate runs in CI and blocks deployment on errors.

## Evaluations

Each skill should have an evaluation file at `evaluations/<skill-name>.json`. Run evaluations with:
```bash
node scripts/evaluate-skills.js <skill-name> # All evals
node scripts/evaluate-skills.js <skill-name> --eval "X" # Single eval by name
node scripts/evaluate-skills.js <skill-name> --no-baseline # Skip without-skill baseline
node scripts/evaluate-skills.js <skill-name> --triggers-only # Trigger evals only
```
Results are saved to `evaluations/results/` (gitignored). See `evaluations/icp-cli.json` for the format.

## Writing Guidelines

- **Write for agents, not humans.** Be explicit with canister IDs, function signatures, and error messages.
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,5 @@ public/llms.txt
public/llms-full.txt
.astro
lighthouse-*
.eval-tmp
evaluations/results/
26 changes: 24 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,13 +123,35 @@ npm run validate # Check frontmatter and sections

This runs automatically in CI and blocks deployment on errors.

### 4. That's it — the website auto-discovers skills
### 4. Add evaluation cases

Create `evaluations/<skill-name>.json` with test cases that verify the skill works. The eval file has two sections:

- **`output_evals`** — realistic prompts with expected behaviors a judge can check
- **`trigger_evals`** — queries that should/shouldn't activate the skill

See `evaluations/icp-cli.json` for a working example. Write prompts the way a developer would actually ask — vague and incomplete, not over-specified test questions.

**Running evaluations** (optional, requires `claude` CLI):

```bash
node scripts/evaluate-skills.js <skill-name> # All evals, with + without skill
node scripts/evaluate-skills.js <skill-name> --eval "name" # Single eval
node scripts/evaluate-skills.js <skill-name> --no-baseline # Skip without-skill run
node scripts/evaluate-skills.js <skill-name> --triggers-only # Trigger evals only
```

This sends each prompt to Claude with and without the skill, then has a judge score the output. Results are saved to `evaluations/results/` (gitignored).

Including a summary of eval results in your PR description is recommended but not required — running evals needs `claude` CLI access and costs API credits.

### 5. That's it — the website auto-discovers skills

The website is automatically generated from the SKILL.md frontmatter at build time. You do **not** need to edit any source file. Astro reads all `skills/*/SKILL.md` files, parses their frontmatter, and generates the site pages, `llms.txt`, discovery endpoints, and other files.

Stats (skill count, categories) all update automatically.

### 5. Submit a PR
### 6. Submit a PR

- One skill per PR
- Include a brief description of what the skill covers and why it's needed
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for how to add or update skills.
- **Hosting**: GitHub Pages via Actions
- **Skills**: Plain markdown files in `skills/*/SKILL.md`
- **Validation**: Structural linter for frontmatter and code blocks (`npm run validate`)
- **Evaluation**: Per-skill eval cases with LLM-as-judge scoring (`node scripts/evaluate-skills.js <skill>`)
- **Schema**: JSON Schema for frontmatter at `skills/skill.schema.json`
- **SEO**: Per-skill meta tags, JSON-LD (TechArticle), sitemap, canonical URLs
- **Skills Discovery**: `llms.txt`, `llms-full.txt`, `.well-known/skills/` ([Skills Discovery RFC](https://github.com/cloudflare/agent-skills-discovery-rfc))
Expand Down
70 changes: 70 additions & 0 deletions evaluations/icp-cli.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
{
"skill": "icp-cli",
"description": "Evaluation cases for the icp-cli skill. Tests whether agents produce correct icp-cli commands and configuration instead of legacy dfx equivalents.",

"output_evals": [
{
"name": "New project setup",
"prompt": "I want to build a dapp on ICP with a Rust backend and a React frontend. How do I set this up?",
"expected_behaviors": [
"Uses icp (not dfx) commands throughout",
"Configuration file is icp.yaml, NOT dfx.json",
"Canisters are a YAML array of objects (- name: ...), NOT a keyed map",
"Rust canister uses a recipe with a version pin (e.g., @dfinity/rust@v3.2.0)",
"Frontend/asset canister uses a recipe with a version pin",
"Asset canister recipe includes explicit build commands",
"Shows how to start the local network (icp network start -d)"
]
},
{
"name": "Deploy to mainnet",
"prompt": "My canisters work locally, how do I get them on mainnet?",
"expected_behaviors": [
"Uses 'icp deploy -e ic', NOT 'dfx deploy --network ic' or '--network ic'",
"Mentions cycles are needed",
"Mentions canister IDs are stored in .icp/data/ and should be committed to git",
"Does NOT use --network ic flag for deployment"
]
},
{
"name": "Migrate from dfx",
"prompt": "I have an older IC project that still uses dfx and dfx.json. It has a Motoko backend and a frontend. I want to switch to the new CLI. I also have canisters running on mainnet already.",
"expected_behaviors": [
"Creates icp.yaml with recipe-based canister configuration",
"Motoko canister uses @dfinity/motoko recipe with a version pin",
"Asset canister uses @dfinity/asset-canister recipe with a version pin",
"Explains identity migration (export from dfx, import into icp)",
"Explains canister ID migration via .icp/data/mappings/ic.ids.json",
"Uses correct icp identity commands ('icp identity default' not 'icp identity use')"
]
}
],

"trigger_evals": {
"description": "Queries to test whether the skill activates correctly. 'should_trigger' queries should cause the skill to load; 'should_not_trigger' queries should NOT activate this skill.",
"should_trigger": [
"Set up a new Internet Computer project with Rust",
"How do I deploy my canister to the local network?",
"What's the icp.yaml config for a Motoko canister?",
"I'm getting an error with dfx deploy, can you help?",
"How do I start the local replica?",
"Migrate my dfx.json project to the new CLI",
"How do I create a new identity for mainnet deployment?",
"What recipes are available for icp-cli?",
"My icp deploy is failing with a build error",
"How do I check my canister status on mainnet?"
],
"should_not_trigger": [
"Add access control to my Motoko canister",
"How does stable memory work in Rust canisters?",
"Implement ICRC-1 token transfer in my canister",
"Write a unit test for my Motoko actor",
"Set up inter-canister calls between two canisters",
"How do I use certified variables?",
"Explain the IC consensus mechanism",
"Add Internet Identity login to my frontend",
"How do I handle canister upgrades safely?",
"What's the best way to store large data on-chain?"
]
}
}
Loading