Skip to content

cpplain/claude-agent-harness

Repository files navigation

Agent Harness

A generic, configurable harness for long-running autonomous coding agents. Built on the Claude Agent SDK, it implements Anthropic's guide for effective agent harnesses, featuring phase-driven execution, configurable MCP tools, and SDK-native sandbox isolation.

Features

Agent Harness provides:

  • Phase-based workflows — declarative phase definitions with conditions and run-once semantics
  • TOML-based configuration — no code changes needed to customize behavior
  • SDK-native security — OS sandbox with network isolation, declarative permission rules (allow/deny), secure defaults
  • Progress tracking — JSON checklist, notes file, or none with automatic completion detection
  • Error recovery — exponential backoff and circuit breaker to prevent runaway costs
  • MCP server support — browser automation, databases, etc.
  • Session persistence — auto-continue across sessions with state tracking
  • Setup verification — check auth, tools, config before running

Prerequisites

  • Python 3.10+
  • uv package manager

Getting Started

→ For a detailed step-by-step guide, see docs/setup-guide.md

1. Install

git clone <repo-url>
cd claude-agent-harness
uv tool install .

2. Set up authentication

Export one of these environment variables:

  • ANTHROPIC_API_KEY — get one from console.anthropic.com
  • CLAUDE_CODE_OAUTH_TOKEN — via claude setup-token

See .env.example for all options.

Using 1Password CLI

If you manage secrets with 1Password CLI, create a .env file with an op:// reference:

ANTHROPIC_API_KEY="op://Vault/Item/api_key"

Then wrap any command with op run:

op run --env-file "./.env" -- agent-harness run --project-dir ./my-project

3. Run

# Scaffold a new project configuration
agent-harness init --project-dir ./my-project

# Edit the configuration
#    -> ./my-project/.agent-harness/config.toml

# Verify setup
agent-harness verify --project-dir ./my-project

# Run the agent
agent-harness run --project-dir ./my-project

CLI Reference

agent-harness init --project-dir <path>     # Scaffold new config
agent-harness verify --project-dir <path>   # Check setup
agent-harness run --project-dir <path>      # Run the agent

# Info commands (for skill/automation integration)
agent-harness info template [--name NAME] [--list] [--all] [--json]
agent-harness info schema [--json]
agent-harness info preset [--name NAME] [--list] [--json]
agent-harness info guide [--json]

# Options
--project-dir PATH      # Agent's working directory (default: .)
--max-iterations N      # Override max iterations (run only)
--model MODEL           # Override model (run only)
--version              # Show version

Info Commands

The info subcommand provides programmatic access to templates, configuration schema, presets, and documentation. These commands are designed for integration with Claude Code skills and other automation tools:

  • info template — Get template file contents (config.toml, spec.md, init.md, build.md)
    • Use --all to fetch all templates with content in one command
  • info schema — Get JSON schema describing all configuration options
  • info preset — Get preset configurations for common use cases (python, web-nodejs, read-only)
  • info guide — Get the complete setup guide as structured JSON or markdown

All commands support --json flag for machine-readable output.

Claude Code Skill

A Claude Code skill is available via the plugin system:

# Step 1: Add the marketplace
/plugin marketplace add <github-url-or-local-clone>

# Step 2: Install the plugin
/plugin install agent-harness@agent-harness-plugin

Once installed, invoke the skill in Claude Code:

/agent-harness                                  # Interactive mode selection
/agent-harness help me setup using ./spec.md    # Custom instruction

The skill provides:

  • Interactive interviews for project setup
  • Preset recommendations based on tech stack
  • Security-first configuration with clear trade-offs
  • Automated fixes for common configuration issues

How It Works

The harness executes agents in configurable phases with conditions and run-once semantics. Each phase gets a fresh Claude SDK session (no context carryover) with a configured prompt.

Phase execution:

  • Phases run sequentially based on conditions (exists:, not_exists: path checks)
  • run_once: true phases skip after first successful completion
  • State persists in .agent-harness/session.json

Session management:

  • Fresh context per session prevents context pollution
  • Progress preserved via tracking file (e.g., feature_list.json), session state, and git commits
  • Auto-continue after configured delay (default 3s)
  • Completion detection: Harness stops when tracker.is_complete() returns true (only json_checklist supports this; notes_file and none require manual stop via Ctrl+C)
  • Press Ctrl+C to pause; run same command to resume

Error recovery:

Prevents runaway API costs when sessions fail repeatedly:

  • Tracks consecutive errors across sessions
  • Exponential backoff: 5s → 10s → 20s → 40s → 80s (circuit breaker trips; max cap 120s)
  • Circuit breaker: Trips after 5 consecutive errors (configurable)
  • Successful session resets error counter
  • Error context forwarded to next session to help recovery
[error_recovery]
max_consecutive_errors = 5
initial_backoff_seconds = 5.0
max_backoff_seconds = 120.0
backoff_multiplier = 2.0

Security Model

This harness follows Anthropic's secure deployment recommendations by relying on the SDK's built-in sandbox and permission system as the primary defense, rather than custom application-layer validation.

SDK-Native Sandbox

The Claude SDK provides process-level isolation with:

  • Process isolation — Bash commands run in a sandboxed subprocess
  • Network restrictions — Configurable domain allowlist and Unix socket access
  • Filesystem boundaries — Commands are restricted to the project directory
[security.sandbox]
enabled = true
auto_allow_bash_if_sandboxed = true
allow_unsandboxed_commands = false  # secure default

[security.sandbox.network]
# Example allowed domains (configure for your project needs)
allowed_domains = ["registry.npmjs.org", "github.com"]
allow_local_binding = false
allow_unix_sockets = []

Declarative Permission Rules

Security is enforced through SDK permission rules, not runtime command parsing:

[security.permissions]
allow = [
    "Bash(npm *)", "Bash(node *)", "Bash(git *)",
    "Bash(ls *)", "Bash(cat *)", "Bash(grep *)",
    "Read(./**)", "Write(./**)", "Edit(./**)",
]
deny = [
    "Bash(curl *)", "Bash(wget *)",
    "Read(./.env)", "Read(./.env.*)",
]

Permission rules are evaluated by the SDK before tool execution. The agent cannot bypass these rules through prompt injection or indirect command execution.

Secure Defaults

  • allow_unsandboxed_commands defaults to false
  • When sandbox is enabled, auto_allow_bash_if_sandboxed=true auto-allows Bash commands
  • When sandbox is disabled, explicit permissions.allow rules are required
  • Network access is denied by default

Git Protection Recommendations

For production deployments, protect critical branches using server-side git hooks or branch protection rules on your git hosting platform (GitHub, GitLab, Bitbucket), not client-side validation. This prevents destructive operations like git push --force at the source.

Configuration

Configuration lives in .agent-harness/config.toml.

Directory Layout

project_dir/
├── .agent-harness/
│   ├── logs/                  # Session logs (auto-created, gitignored)
│   ├── config.toml            # Main configuration (required)
│   ├── spec.md                # Project specification
│   ├── session.json           # Session number, completed phases (auto-created)
│   └── prompts/               # Prompt files (referenced by config)
│       ├── init.md
│       └── build.md
└── (generated code lives here)

Configuration Reference

For a complete, annotated configuration reference with detailed comments on all available options, see:

The init command creates a new config using the template:

agent-harness init --project-dir ./my-project

Config Loading Precedence

CLI flags > config.toml values > defaults

Project Structure

claude-agent-harness/
├── agent_harness/          # Python package
│   ├── __init__.py
│   ├── __main__.py         # Entry point
│   ├── cli.py              # Argument parsing, subcommands
│   ├── client_factory.py   # Builds ClaudeSDKClient from config
│   ├── config.py           # Config loading, validation, HarnessConfig
│   ├── runner.py           # Generic agent loop
│   ├── tracking.py         # Progress tracking implementations
│   └── verify.py           # Setup verification checks
├── examples/
│   ├── claude-ai-clone/
│   │   ├── .agent-harness/
│   │   │   ├── config.toml
│   │   │   ├── spec.md
│   │   │   └── prompts/
│   │   │       ├── init.md
│   │   │       └── build.md
│   │   └── README.md
│   └── simple-calculator/
│       ├── .agent-harness/
│       │   ├── config.toml
│       │   ├── spec.md
│       │   └── prompts/
│       │       ├── init.md
│       │       └── build.md
│       └── README.md
├── tests/
│   ├── test_cli.py
│   ├── test_client_factory.py
│   ├── test_config.py
│   ├── test_prompts.py
│   ├── test_runner.py
│   ├── test_tracking.py
│   └── test_verify.py
├── .env.example
└── pyproject.toml

Examples

Claude.ai Clone (Next.js)

See examples/claude-ai-clone/ for a complete example that:

  • Uses Next.js/React stack (npm, node commands)
  • Integrates Puppeteer MCP server for browser testing
  • Generates a production-quality chat interface
  • Tracks progress via feature_list.json
# Run the Claude.ai clone example
mkdir -p ./my-clone-output
cp -r examples/claude-ai-clone/.agent-harness ./my-clone-output/
agent-harness run --project-dir ./my-clone-output

Simple Calculator (Python)

See examples/simple-calculator/ for a minimal example that:

  • Uses Python stdlib only (no external dependencies)
  • Completes in ~5 minutes (good for demos)
  • Shows basic two-phase workflow (init + build)
  • Tracks progress via feature_list.json

Troubleshooting

"Configuration file not found"

The harness expects a .agent-harness/config.toml file in your project directory. If you see this error:

  1. Check that you're running from the correct directory
  2. Use agent-harness init --project-dir ./my-project to scaffold a new configuration

"Prompt file not found"

Check that all file: references in your config.toml point to files relative to the .agent-harness/ directory:

[[phases]]
prompt = "file:prompts/coding_prompt.md"  # Must exist at .agent-harness/prompts/coding_prompt.md

"Neither ANTHROPIC_API_KEY nor CLAUDE_CODE_OAUTH_TOKEN is set"

You need authentication credentials to use the Claude API:

  • API Key: Get one from console.anthropic.com and set export ANTHROPIC_API_KEY="your-key"
  • OAuth Token: Run claude setup-token and the harness will use CLAUDE_CODE_OAUTH_TOKEN automatically
  • See .env.example for setting these via environment file

Agent is hanging on the first session

The first session can take 10-20+ minutes for complex projects as it reads the spec, plans features, creates project structure, and sets up git. This is expected behavior. Subsequent sessions are typically faster.

If a session truly hangs:

  1. Check .agent-harness/session.json for error messages
  2. Look for permission prompts or security blocks in the output
  3. Verify your progress file format matches the configuration (e.g., feature_list.json with "passes": false fields)

Development

# Install dependencies
uv sync

# Run all tests
uv run python -m unittest discover tests -v

# Run specific test modules
uv run python -m unittest tests.test_config -v         # Configuration loading
uv run python -m unittest tests.test_tracking -v       # Progress tracking
uv run python -m unittest tests.test_runner -v         # Session loop logic
uv run python -m unittest tests.test_client_factory -v # Client creation

Test coverage includes:

  • Security configuration: Sandbox settings, permission rules, network isolation
  • Configuration loading: TOML parsing, defaults, validation, error cases
  • Progress tracking: Completion detection, JSON parsing, print formatting
  • Prompt loading: File reading, file: resolution, error handling

License

MIT License. See LICENSE.