Skip to content

Skamiplan/Wreck-It-Ralph

Repository files navigation

Wreck-It Ralph

Wreck-It Ralph

Autonomous web application security testing agent powered by Claude.

Wreck-It Ralph orchestrates Claude CLI with browser automation (Playwright MCP) to methodically test web applications for security vulnerabilities. It runs in iterations — each one a full Claude session that picks up where the last left off — with hook-based enforcement of scope, rate limits, and safety controls.

How It Works

flowchart TD
    A["@targets.md + SECURITY_BRIEF.md"] --> B["Wreck-It Ralph Orchestrator"]
    B --> C["Claude CLI + Playwright Browser"]

    C --> D{"Testing Phase"}
    D --> E["Reconnaissance"]
    D --> F["Auth Testing"]
    D --> G["Input Validation"]
    D --> H["Access Control"]
    D --> I["Business Logic"]
    D --> J["API Security"]

    E & F & G & H & I & J --> K["WRECK_STATUS + WRECK_FINDING + WRECK_LEARNED"]

    K --> L{"More phases?"}
    L -- Yes --> M["Next Iteration"]
    M --> C
    L -- No --> N["HTML + Markdown Reports"]

    subgraph Hooks ["Safety Hooks (enforce on every action)"]
        direction LR
        S1["Scope Enforcer"]
        S2["Rate Limiter"]
        S3["Payload Validator"]
        S4["Stop Validator"]
    end

    C -. "every tool call" .-> Hooks
    Hooks -. "block or allow" .-> C

    subgraph Memory ["Persisted Across Iterations"]
        direction LR
        M1["Learned Skills"]
        M2["Findings"]
        M3["Checkpoints"]
        M4["Scope Learning"]
    end

    K --> Memory
    Memory --> B
Loading

Features

Core Testing Loop

  • Phase-based testing — Reconnaissance, Authentication, Input Validation, Access Control, Business Logic, API Security
  • Iteration continuity — Context injected at each iteration start so Claude knows what was done, what's left, and what failed
  • Checkpoint recovery — Crash mid-run? Resume from the last completed iteration
  • Empty iteration detection — Exponential backoff when Claude gets stuck, auto-stops after prolonged stalling

Multi-Target Support

  • Define multiple related targets (e.g., frontend + API) in one @targets.md
  • Each target has its own scope, auth config, and type (WebApplication, Api, SinglePageApp, MobileBackend)
  • Targets can declare dependencies (DependsOn) for cross-target testing (CORS, token leakage)
  • Scope patterns are combined across all targets for the enforcer hooks

Safety Hooks (Enforced, Not Suggested)

Hooks are Node.js scripts that block Claude's actions until requirements are met. They are not prompt instructions — they are enforcement mechanisms.

Hook What It Does
scope-enforcer.mjs Blocks navigation to out-of-scope URLs
rate-limiter.mjs Enforces requests-per-minute limit
payload-validator.mjs Blocks destructive payloads (DROP TABLE, rm -rf, etc.)
stop-validator.mjs Blocks output unless WRECK_STATUS block is present and valid
file-validator.mjs Prevents writes to wrong files
session-start.mjs Injects iteration context, skills, and blocked ops history
activity-tracker.mjs Logs all tool use for audit trail

Learned Skills System

Claude accumulates knowledge across iterations:

  • Claude-reported skills — Claude emits WRECK_LEARNED blocks when it discovers target-specific patterns (WAF behavior, auth quirks, API conventions)
  • Auto-generated failure skills — Repeated blocked operations automatically become skills so Claude stops retrying the same mistakes
  • Confidence decay — Unused skills fade over time; frequently referenced skills get boosted
  • Deduplication — Existing skills are shown to Claude with content previews to prevent redundant reports

Finding Management

  • Deduplication — Hash-based (URL + param + category + payload) and normalized title matching
  • Verification — Optional re-test of high-severity findings for confirmation
  • Evidence capture — HTTP request/response pairs and screenshots stored per finding
  • OWASP/CWE/WSTG mapping — Findings tagged with industry-standard identifiers

Scope Learning

  • Tracks repeatedly blocked hosts and suggests scope additions
  • Classifies blocked URLs by type (API endpoints, CDN, third-party services)
  • Saves suggestions to logs/scope-learning/scope-suggestions.md

Reporting

  • HTML report — Styled, self-contained report with finding details, severity breakdown, and evidence
  • Markdown report — Same content in plain text for version control or further processing
  • Generated automatically at session end (even on Ctrl+C)

Quality of Life

  • Interactive setup — Run with no arguments for a guided configuration wizard
  • System tray icon — Shows progress, current phase, finding count (Windows)
  • Audio notifications — Sounds for startup, iteration complete, finding discovered, errors
  • Toast notifications — Windows notifications for completion and errors
  • Headless mode — Run Playwright without a visible browser window
  • Temp email accounts — Auto-create test accounts via temporary email services for authenticated testing
  • Reconnaissance artifacts — Network captures, page snapshots, and screenshots preserved for review

Quick Start

# Build
dotnet build

# Run with no arguments for interactive setup
dotnet run --project src/WreckItRalph

# Or specify options directly
dotnet run --project src/WreckItRalph -- --targets @targets.md --brief SECURITY_BRIEF.md

# Validate configuration without running
dotnet run --project src/WreckItRalph -- --dry-run

# Publish self-contained binary
dotnet publish -c Release -r win-x64

CLI Options

wreck [options]

Options:
  -t, --targets <file>       Targets file (default: @targets.md)
  -b, --brief <file>         Security brief (default: SECURITY_BRIEF.md)
  -m, --max-iterations <n>   Max iterations (default: 50)
  -d, --delay <seconds>      Delay between iterations (default: 5)
  --timeout <minutes>        Timeout per iteration (default: 30)
  --rate-limit <rpm>         Requests per minute (default: 30)
  --no-verify                Skip finding verification
  --report-dir <dir>         Report output directory (default: reports)
  -c, --config <file>        Config file (default: wreck.json)
  -s, --safe-mode            Use cmd.exe without streaming output
  --model <name>             Claude model to use
  --api-key <key>            API key for the model provider
  -v, --verbose              Show detailed output
  --no-hooks                 Disable safety hooks
  --headless                 Run browser in headless mode
  --dry-run                  Validate config only

Configuration

@targets.md

Defines testing scope, authentication, and phases.

Single target:

# Security Testing Scope

## Target
- Name: My Application
- Base URL: https://app.example.com
- Type: WebApplication

## Authentication
- Type: FormLogin
- Login URL: /login

## In-Scope
- https://app.example.com/**

## Out-of-Scope
- https://app.example.com/admin/**

## Testing Phases
- [ ] Reconnaissance
- [ ] Authentication Testing
- [ ] Input Validation (XSS, SQLi)
- [ ] Access Control (IDOR)
- [ ] Business Logic
- [ ] API Security

Multi-target:

# Security Testing Scope

## Target
- Name: Frontend
- Base URL: https://app.example.com
- Type: SinglePageApp
- Primary: true

## Target
- Name: API
- Base URL: https://api.example.com
- Type: Api
- DependsOn: Frontend

## In-Scope
- https://app.example.com/**
- https://api.example.com/**

## Testing Phases
- [ ] Reconnaissance
- [ ] Authentication Testing
- [ ] Input Validation (XSS, SQLi)
- [ ] Access Control (IDOR)
- [ ] Cross-Origin Testing

SECURITY_BRIEF.md

Testing instructions and methodology for Claude. Describes the target application, known features, areas of concern, and any special testing requirements.

wreck.json (optional)

JSON configuration file for hook settings and other options:

{
  "hooksConfig": {
    "scopeEnforcement": true,
    "rateLimiting": true,
    "blockDestructive": true,
    "activityTracking": true,
    "contextInjection": true
  }
}

Status Protocol

Claude reports status at the end of each iteration:

---WRECK_STATUS---
{"phase":"RECONNAISSANCE","status":"IN_PROGRESS","newFindings":0,"highestSeverity":"NONE","endpointsTested":5,"endpointsDiscovered":10,"exitSignal":false,"recommendation":"Continue scanning"}
---END_WRECK_STATUS---

Findings are reported inline:

---WRECK_FINDING---
{"title":"Reflected XSS in Search","severity":"HIGH","category":"XSS","url":"https://target.com/search","parameter":"q","payload":"<script>alert(1)</script>","description":"User input reflected without encoding","evidence":"Response contains unescaped payload","reproduction":"Navigate to /search, enter payload","recommendation":"HTML-encode output","cwe":"CWE-79","owasp":"A03:2021","wstg":"WSTG-INPV-01","confidence":0.9}
---END_WRECK_FINDING---

Learned skills are reported when Claude discovers reusable target-specific knowledge:

---WRECK_LEARNED---
{"skillName":"waf-blocks-inline-scripts","skillDescription":"WAF blocks script tags but allows event handlers","skillContent":"Use onerror/onload event handlers instead of <script> tags for XSS testing"}
---END_WRECK_LEARNED---

Runtime Files

When running, the tool creates:

  • wreck-hooks/ — Generated Node.js hook scripts
  • .claude/settings.local.json — Hook configuration for Claude CLI
  • logs/ — Iteration logs, context-input.json, blocked operations, learned skills
  • reports/ — Generated HTML and Markdown security reports
  • evidence/ — HTTP evidence and screenshots for findings
  • recon/ — Reconnaissance artifacts (network captures, snapshots)
  • attack-surface.md — Created by Claude during reconnaissance

Requirements

  • .NET 10.0 SDK
  • Claude CLI (claude.ai/code)
  • Node.js (for hook scripts and Playwright MCP server)

Important Notices

This tool is for authorized security testing only. You must have explicit written permission to test any target application. Unauthorized security testing is illegal in most jurisdictions.

Uses --dangerously-skip-permissions. Wreck-It Ralph runs Claude CLI with this flag to enable autonomous operation. This gives Claude unrestricted tool access within the session. The safety hooks provide guardrails, but they are not a security boundary — they are best-effort enforcement.

Scope enforcement is not airtight. Hooks validate URL patterns and payload regex, but edge cases exist. This tool assists authorized testing; it does not guarantee confinement.

Each iteration consumes Claude API credits. A typical 15-iteration run involves 15 full Claude sessions with browser automation. Monitor your usage.

Check Anthropic's acceptable use policy before using this tool for automated security testing via Claude CLI.

Project Structure

src/WreckItRalph/
├── Program.cs                    # CLI entry point + interactive setup
├── Config/                       # WreckOptions, HooksConfig
├── Models/                       # Target, Finding, WreckStatusBlock
├── Orchestration/                # Main testing loop
├── Services/                     # Status parsing, findings, logging, evidence
├── Hooks/
│   ├── SafetyHookManager.cs      # Hook script generation + context injection
│   ├── Scripts/                  # Embedded Node.js hook scripts
│   └── Skills/                   # Learned skills manager (CRUD, decay, usage)
├── Reporting/                    # HTML + Markdown report generation
├── Tray/                         # System tray icon + notifications
├── Setup/                        # Interactive setup wizard + templates
└── Output/                       # Console output formatting

tests/WreckItRalph.Tests/         # xUnit tests

License

MIT

About

No description, website, or topics provided.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published