Katt is a lightweight testing framework for running AI Evals, inspired by Jest.
- Overview
- API Documentation
- Articles
- Hello World - Example
- Main Features
- Installation
- Basic Usage
- Specifying AI Models
- Development
- How It Works
- Execution Flow
- Architecture
- Requirements
- License
- Contributing
Katt is designed to evaluate and validate the behavior of AI agents like Claude Code, GitHub Copilot, OpenAI Codex and more. It provides a simple, intuitive API for writing tests that interact with AI models and assert their responses.
For a complete list of features and usage examples, see docs/api-documentation.md.
import { expect, prompt } from "katt";
const result = await prompt("If you read this just say 'hello world'");
expect(result).toContain("hello world");It also supports the familiar describe and it syntax for organizing tests:
import { describe, expect, it, prompt } from "katt";
describe("Greeting agent", () => {
it("should say hello world", async () => {
const result = await prompt("If you read this just say 'hello world'");
expect(result).toContain("hello world");
});
});- Simple Testing API: Familiar
describeanditsyntax for organizing tests - AI Interaction and Verification: Built-in
prompt(),promptFile()andpromptCheck()functions for running and analyzing prompts to AI agents - Classification Matcher: Built-in
toBeClassifiedAs()matcher to grade a response against a target label on a 1-5 scale - Concurrent Execution: Runs eval files concurrently for faster test execution
- Model Selection: Support for specifying custom AI models
- Runtime Selection: Run prompts through GitHub Copilot (default) or Codex
- Configurable Timeouts: Override prompt wait time per test or via
katt.json
npm install -g katt- Create a file with the
.eval.tsor.eval.jsextension and write your tests.
import { expect, prompt } from "katt";
const result = await prompt("If you read this just say 'hello world'");
expect(result).toContain("hello world");- Run Katt from your project directory:
npx kattLoad prompts from external files:
// test.eval.js
import { describe, expect, it, promptFile } from "katt";
describe("Working with files", () => {
it("should load the file and respond", async () => {
const result = await promptFile("./myPrompt.md");
expect(result).toContain("expected response");
});
});You can specify a custom model for your prompts:
import { describe, expect, it, prompt } from "katt";
describe("Model selection", () => {
it("should use a specific model", async () => {
const promptString = "You are a helpful agent. Say hi and ask what you could help the user with.";
const result = await prompt(promptString, { model: "gpt-5.2" });
expect(result).promptCheck("It should be friendly and helpful");
});
});You can also set runtime defaults in katt.json.
GitHub Copilot (default runtime):
{
"agent": "gh-copilot",
"agentOptions": {
"model": "gpt-5-mini"
},
"prompt": {
"timeoutMs": 240000
}
}Codex:
{
"agent": "codex",
"agentOptions": {
"model": "gpt-5-codex",
"profile": "default",
"sandbox": "workspace-write"
},
"prompt": {
"timeoutMs": 240000
}
}When this file exists:
- Supported agents are:
gh-copilot(default whenagentis missing or unsupported)codex
prompt("...")andpromptFile("...")mergeagentOptionswith call-time optionsprompt("...", { model: "..." })overrides the model from configprompt.timeoutMssets the default wait timeout for long-running prompts
npm installnpm run dev- Run the CLI in development modenpm run build- Build the projectnpm run test- Run testsnpm run typecheck- Run TypeScript type checkingnpm run format- Format code using Biomenpm run lint- Lint code using Biomenpm run test:build- Test the built CLI
To verify your changes before opening a pull request, run:
npm testnpm run typechecknpm run lintnpm run format
For more details, see the verification process section in CONTRIBUTING.md.
Katt runs eval files as executable test programs and coordinates collection, assertion failures, and reporting through its runtime context.
sequenceDiagram
participant User as User/CI
participant CLI as katt CLI
participant FS as File Scanner
participant Eval as Eval Runtime
participant Report as Reporter
User->>CLI: Run `npx katt`
CLI->>FS: Discover `*.eval.js` and `*.eval.ts`
FS-->>CLI: Return eval file list
CLI->>Eval: Execute eval files
Eval-->>CLI: Return pass/fail results
CLI->>Report: Print per-test output + summary
Report-->>User: Exit code (`0` pass, `1` fail)
- Katt searches the current directory recursively for
*.eval.jsand*.eval.tsfiles - It skips
.gitandnode_modulesdirectories - Found eval files are imported and executed concurrently
- Tests registered with
describe()andit()are collected and run - Each test duration is printed after execution
- A summary is displayed showing passed/failed tests and total duration
- Katt exits with code
0on success or1on failure
flowchart LR
User["Developer"] --> CLI["katt CLI"]
CLI --> EvalFiles["Eval files (*.eval.ts / *.eval.js)"]
CLI --> Config["katt.json config"]
EvalFiles --> Runtime["Test runtime (describe/it context)"]
Config --> Runtime
Runtime --> Assertions["Assertions + snapshots"]
Runtime --> Prompts["prompt() / promptFile()"]
Prompts --> AI["AI runtime (GitHub Copilot or Codex CLI)"]
Assertions --> Report["Terminal report + exit code"]
AI --> Report
- Node.js
- For
gh-copilotruntime: access to GitHub Copilot with a logged-in user - For
codexruntime: Codex CLI installed and authenticated (codex login)
MIT
We welcome contributions from the community! Please see our CONTRIBUTING.md guide for detailed information on how to contribute to Katt.
Quick start:
- Fork the repository
- Create a feature branch
- Make your changes
- Run the verification process
- Submit a pull request
For detailed guidelines, development setup, coding standards, and more, check out our contribution guide.
