Katt

Katt is a lightweight testing framework for running AI Evals, inspired by Jest.

Overview

✨ Run your own benchmarks and evaluations ✨

Katt is designed to evaluate and validate the behavior of AI agents like Claude Code, GitHub Copilot, OpenAI Codex and more. It provides a simple, intuitive API for writing tests that interact with AI models and assert their responses.

API Documentation

For a complete list of features and usage examples, see docs/api-documentation.md.

Articles

Introducing Katt

Hello World - Example

import { expect, prompt } from "katt";

const result = await prompt("If you read this just say 'hello world'");
expect(result).toContain("hello world");

It also supports the familiar describe and it syntax for organizing tests:

import { describe, expect, it, prompt } from "katt";

describe("Greeting agent", () => {
  it("should say hello world", async () => {
    const result = await prompt("If you read this just say 'hello world'");
    expect(result).toContain("hello world");
  });
});

Main Features

Simple Testing API: Familiar describe and it syntax for organizing tests
AI Interaction and Verification: Built-in prompt(), promptFile() and promptCheck() functions for running and analyzing prompts to AI agents
Classification Matcher: Built-in toBeClassifiedAs() matcher to grade a response against a target label on a 1-5 scale
Concurrent Execution: Runs eval files concurrently for faster test execution
Model Selection: Support for specifying custom AI models
Runtime Selection: Run prompts through GitHub Copilot (default) or Codex
Configurable Timeouts: Override prompt wait time per test or via katt.json

Usage

Installation

npm install -g katt

Basic Usage

Create a file with the .eval.ts or .eval.js extension and write your tests.

import { expect, prompt } from "katt";

const result = await prompt("If you read this just say 'hello world'");
expect(result).toContain("hello world");

Run Katt from your project directory:

npx katt

Using promptFile

Load prompts from external files:

// test.eval.js
import { describe, expect, it, promptFile } from "katt";

describe("Working with files", () => {
  it("should load the file and respond", async () => {
    const result = await promptFile("./myPrompt.md");
    expect(result).toContain("expected response");
  });
});

Specifying AI Models

You can specify a custom model for your prompts:

import { describe, expect, it, prompt } from "katt";

describe("Model selection", () => {
  it("should use a specific model", async () => {
    const promptString = "You are a helpful agent. Say hi and ask what you could help the user with.";
    const result = await prompt(promptString, { model: "gpt-5.2" });

    expect(result).promptCheck("It should be friendly and helpful");
  });
});

You can also set runtime defaults in katt.json.

GitHub Copilot (default runtime):

{
  "agent": "gh-copilot",
  "agentOptions": {
    "model": "gpt-5-mini"
  },
  "prompt": {
    "timeoutMs": 240000
  }
}

Codex:

{
  "agent": "codex",
  "agentOptions": {
    "model": "gpt-5-codex",
    "profile": "default",
    "sandbox": "workspace-write"
  },
  "prompt": {
    "timeoutMs": 240000
  }
}

When this file exists:

Supported agents are:
- gh-copilot (default when agent is missing or unsupported)
- codex
prompt("...") and promptFile("...") merge agentOptions with call-time options
prompt("...", { model: "..." }) overrides the model from config
prompt.timeoutMs sets the default wait timeout for long-running prompts

Development

Setup

npm install

Available Scripts

npm run dev - Run the CLI in development mode
npm run build - Build the project
npm run test - Run tests
npm run typecheck - Run TypeScript type checking
npm run format - Format code using Biome
npm run lint - Lint code using Biome
npm run test:build - Test the built CLI

Verification Process

To verify your changes before opening a pull request, run:

npm test
npm run typecheck
npm run lint
npm run format

For more details, see the verification process section in CONTRIBUTING.md.

How It Works

Katt runs eval files as executable test programs and coordinates collection, assertion failures, and reporting through its runtime context.

Execution Flow

sequenceDiagram
  participant User as User/CI
  participant CLI as katt CLI
  participant FS as File Scanner
  participant Eval as Eval Runtime
  participant Report as Reporter

  User->>CLI: Run `npx katt`
  CLI->>FS: Discover `*.eval.js` and `*.eval.ts`
  FS-->>CLI: Return eval file list
  CLI->>Eval: Execute eval files
  Eval-->>CLI: Return pass/fail results
  CLI->>Report: Print per-test output + summary
  Report-->>User: Exit code (`0` pass, `1` fail)

Katt searches the current directory recursively for *.eval.js and *.eval.ts files
It skips .git and node_modules directories
Found eval files are imported and executed concurrently
Tests registered with describe() and it() are collected and run
Each test duration is printed after execution
A summary is displayed showing passed/failed tests and total duration
Katt exits with code 0 on success or 1 on failure

Architecture

flowchart LR
  User["Developer"] --> CLI["katt CLI"]
  CLI --> EvalFiles["Eval files (*.eval.ts / *.eval.js)"]
  CLI --> Config["katt.json config"]
  EvalFiles --> Runtime["Test runtime (describe/it context)"]
  Config --> Runtime
  Runtime --> Assertions["Assertions + snapshots"]
  Runtime --> Prompts["prompt() / promptFile()"]
  Prompts --> AI["AI runtime (GitHub Copilot or Codex CLI)"]
  Assertions --> Report["Terminal report + exit code"]
  AI --> Report

Requirements

Node.js
For gh-copilot runtime: access to GitHub Copilot with a logged-in user
For codex runtime: Codex CLI installed and authenticated (codex login)

License

MIT

Contributing

We welcome contributions from the community! Please see our CONTRIBUTING.md guide for detailed information on how to contribute to Katt.

Quick start:

Fork the repository
Create a feature branch
Make your changes
Run the verification process
Submit a pull request

For detailed guidelines, development setup, coding standards, and more, check out our contribution guide.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.github		.github
build-tests		build-tests
docs		docs
examples		examples
specs		specs
src		src
.gitignore		.gitignore
.npmignore		.npmignore
.nvmrc		.nvmrc
AGENTS.md		AGENTS.md
CODE_OF_CONDUCT.MD		CODE_OF_CONDUCT.MD
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
biome.json		biome.json
katt-codex.json		katt-codex.json
katt.json		katt.json
package-lock.json		package-lock.json
package.json		package.json
renovate.json		renovate.json
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Katt

Table of Contents

Overview

✨ Run your own benchmarks and evaluations ✨

API Documentation

Articles

Hello World - Example

Main Features

Usage

Installation

Basic Usage

Using promptFile

Specifying AI Models

Development

Setup

Available Scripts

Verification Process

How It Works

Execution Flow

Architecture

Requirements

License

Contributing

About

Uh oh!

Releases 7

Sponsor this project

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Katt

Table of Contents

Overview

✨ Run your own benchmarks and evaluations ✨

API Documentation

Articles

Hello World - Example

Main Features

Usage

Installation

Basic Usage

Using promptFile

Specifying AI Models

Development

Setup

Available Scripts

Verification Process

How It Works

Execution Flow

Architecture

Requirements

License

Contributing

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 7

Sponsor this project

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages