Skip to content

Voice interface for Claude Code over the phone via Twilio

License

Notifications You must be signed in to change notification settings

dnacenta/voice-echo

Repository files navigation

voice-echo

CI License: AGPL-3.0 Version Rust

Voice interface for Claude Code over the phone. Call in and talk to Claude, or trigger outbound calls from n8n / automation workflows.

Built in Rust. Uses Twilio for telephony, Groq Whisper for speech-to-text, Inworld for text-to-speech, and the Claude Code CLI for reasoning.

Architecture

Voice Pipeline

                         ┌─────────────────────────────────────┐
                         │          voice-echo (axum)        │
                         │                                     │
  Phone ◄──► Twilio ◄──►│  WebSocket ◄──► VAD ──► STT (Groq)  │
                         │                         │           │
                         │                    Claude CLI        │
                         │                         │           │
                         │                    TTS (Inworld)     │
                         │                     mulaw 8kHz       │
                         └─────────────────────────────────────┘

AI-Initiated Outbound Calls (n8n Bridge)

  ┌──────────────┐    trigger     ┌──────────────────┐
  │  Any n8n      │──────────────►│   Orchestrator    │
  │  workflow     │               │   reads registry, │
  │  (alerts,     │               │   routes to       │
  │   cron,       │               │   target module   │
  │   events...) │               └────────┬─────────┘
  └──────────────┘                        │
                                          ▼
                                 ┌──────────────────┐
                                 │   call-human      │
                                 │   builds request, │
                                 │   passes context  │
                                 └────────┬─────────┘
                                          │
                                          │ POST /api/call
                                          │ { to, context }
                                          ▼
                                 ┌──────────────────┐
                                 │  voice-echo     │
                                 │  stores context   │──► Twilio ──► Phone rings
                                 │  per call_sid     │
                                 └────────┬─────────┘
                                          │
                                          │ caller picks up
                                          ▼
                                 ┌──────────────────┐
                                 │  Claude CLI       │
                                 │  first prompt     │
                                 │  includes context │
                                 │  "I'm calling     │
                                 │   because..."     │
                                 └──────────────────┘

Full System

                     ┌──────────┐
  ┌─────────┐        │   n8n    │        ┌───────────────┐
  │ Triggers │──────►│ (Docker) │──────►│ voice-echo  │──► Claude CLI
  │ (cron,   │       │          │  API   │ (Rust, axum)  │
  │  webhook,│       │  orchest.│        └───────┬───────┘
  │  alerts, │       │  call-   │                │
  │  events) │       │  human   │                ▼
  └─────────┘        └──────────┘        ┌───────────────┐
                                         │    Twilio      │◄──► Phone
                                         └───────────────┘

Prerequisites

  • Rust (1.80+)
  • Claude Code CLI installed and authenticated
  • Twilio account with a phone number
  • Groq API key (free tier works)
  • Inworld API key (sign up at platform.inworld.ai)
  • A server with a public HTTPS URL (for Twilio webhooks)
  • nginx (recommended, for TLS termination and WebSocket proxying)

Installation

1. Clone and build

git clone https://github.com/dnacenta/voice-echo.git
cd voice-echo
cargo build --release

2. Run the setup wizard

./target/release/voice-echo --setup

The wizard walks you through the entire setup:

  • Checks that rustc, claude, and openssl are available
  • Prompts for Twilio, Groq, and Inworld credentials (masked input)
  • Asks for your server's external URL
  • Generates an API token for the outbound call endpoint
  • Writes ~/.voice-echo/config.toml
  • Optionally copies the binary to /usr/local/bin/, installs a systemd service, and generates an nginx reverse proxy config

If you skip the optional steps during the wizard, you can always set them up manually using the templates in deploy/.

3. Twilio webhook

In the Twilio Console, set your phone number's voice webhook to:

POST https://your-server.example.com/twilio/voice

4. Start

voice-echo

Or if you installed the systemd service:

sudo systemctl enable --now voice-echo

Manual configuration

If you prefer to skip the wizard and configure by hand:

mkdir -p ~/.voice-echo
cp config.example.toml ~/.voice-echo/config.toml
cp .env.example ~/.voice-echo/.env
chmod 600 ~/.voice-echo/.env

Edit .env with your API keys, and config.toml for your Twilio phone number and other settings. Secrets are loaded from .env, so leave them empty in the TOML. See deploy/nginx.conf and deploy/voice-echo.service for server setup templates.

You can override the config directory with ECHO_CONFIG=/path/to/config.toml.

Configuration Reference

config.toml

Section Field Default Description
server host -- Bind address (e.g. 0.0.0.0)
server port -- Bind port (e.g. 8443)
server external_url -- Public HTTPS URL (overridden by SERVER_EXTERNAL_URL env var)
twilio account_sid -- Twilio Account SID (overridden by env var)
twilio auth_token -- Twilio Auth Token (overridden by env var)
twilio phone_number -- Your Twilio phone number (E.164)
groq api_key -- Groq API key (overridden by env var)
groq model whisper-large-v3-turbo Whisper model to use
inworld api_key -- Inworld API key (overridden by env var)
inworld voice_id Olivia Inworld voice name
inworld model inworld-tts-1.5-max Inworld TTS model
claude session_timeout_secs 300 Conversation session timeout
claude greeting Hello, this is Echo Initial TTS greeting when a call connects
claude dangerously_skip_permissions false Allow Claude CLI to run tools without prompting (see Customizing Claude)
api token -- Bearer token for /api/* (overridden by env var)
vad silence_threshold_ms 1500 Silence duration before utterance ends
vad energy_threshold 50 Minimum RMS energy to detect speech
hold_music file -- Optional path to a WAV file for hold music
hold_music volume 0.3 Playback volume (0.0 to 1.0)

Environment variables

All secrets can be set via env vars (recommended) instead of config.toml:

Variable Overrides
TWILIO_ACCOUNT_SID twilio.account_sid
TWILIO_AUTH_TOKEN twilio.auth_token
GROQ_API_KEY groq.api_key
INWORLD_API_KEY inworld.api_key
ECHO_API_TOKEN api.token
SERVER_EXTERNAL_URL server.external_url
ECHO_CONFIG Config file path
RUST_LOG Log level filter (e.g. voice_echo=debug,tower_http=debug)

Customizing Claude

voice-echo spawns the claude CLI for each conversation. Claude Code reads a CLAUDE.md file from the working directory to set its behavior — this is how you turn generic Claude into your personalized voice assistant.

Create a CLAUDE.md in the directory where voice-echo runs (typically the project root or the home directory of the service user). This file should contain instructions tailored for a voice context:

  • Persona: Define who Claude is on the phone — name, tone, personality.
  • Voice-first rules: Tell Claude to never use markdown, bullet points, numbered lists, or any text formatting. Everything it outputs will be spoken aloud via TTS.
  • Brevity: Phone calls are not lectures. Two to four sentences per response is usually enough.
  • Language: If you want multilingual support, specify which languages and when to switch.
  • Capabilities: Define what Claude can and can't do — run commands, access APIs, check services, etc.
  • Boundaries: Set security rules, topics to avoid, or information to never disclose.

Without a CLAUDE.md, Claude will behave as its default self — functional but generic.

Permissions

Claude Code normally prompts for permission before running tools (shell commands, file edits, etc.). On a phone call there's no terminal to approve prompts, so you have two options:

  1. dangerously_skip_permissions = true in config.toml — Claude runs all tools without asking. Powerful but risky. Only use this if you trust the instructions in your CLAUDE.md and have locked down what Claude can access.
  2. Pre-approve tools via Claude Code's settings.json or allowedTools configuration. This gives you granular control over which tools are auto-approved without blanket permission.

See the Claude Code documentation for details on permission configuration.

Server Setup

TLS certificates

Twilio requires HTTPS for webhooks. If you're using nginx (recommended), get a free certificate with certbot:

sudo apt install certbot python3-certbot-nginx
sudo certbot --nginx -d your-server.example.com

Certificates auto-renew via a systemd timer. The nginx template in deploy/nginx.conf is already configured for the Let's Encrypt certificate paths.

systemd

The included service file (deploy/voice-echo.service) runs as root for simplicity. For production, consider creating a dedicated user:

sudo useradd -r -s /usr/sbin/nologin voice-echo

Then update User=voice-echo in the service file and ensure the user has read access to ~/.voice-echo/ and the claude CLI.

Usage

Call in

Just call your Twilio number. You'll hear the configured greeting, then talk normally.

Trigger an outbound call

curl -X POST https://your-server.example.com/api/call \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "to": "+34612345678",
    "context": "Server CPU at 95% for the last 10 minutes. Top processes: n8n 45%, claude 30%."
  }'

The recipient picks up and Claude already knows why it called — context is injected into the first prompt.

POST /api/call

Requires Authorization: Bearer <token> header.

Field Type Required Description
to string yes Phone number in E.164 format (e.g. +34612345678)
context string no Injected into Claude's first prompt so it knows why it's calling
message string no Twilio <Say> greeting before the stream starts (usually not needed since Claude handles the greeting via TTS)

n8n Bridge

voice-echo integrates with n8n through a bridge architecture:

  • Orchestrator -- central webhook that routes triggers to registered modules
  • Modules -- individual workflows managed via a JSON registry
  • call-human -- module that triggers outbound calls with context

Trigger a call from any n8n workflow via the orchestrator:

curl -X POST http://localhost:5678/webhook/orchestrator \
  -H "Content-Type: application/json" \
  -H "X-Bridge-Secret: YOUR_BRIDGE_SECRET" \
  -d '{
    "action": "trigger",
    "module": "call-human",
    "data": {
      "reason": "Server CPU critical",
      "context": "CPU at 95% for 10 minutes. Load average 12.5.",
      "urgency": "high"
    }
  }'

The orchestrator reads the module registry, forwards the payload to the call-human webhook, which calls the voice-echo API with context. When the user picks up, Claude knows exactly what's happening.

Any n8n workflow can trigger calls by routing through the orchestrator. See specs/n8n-bridge-spec.md for the full specification.

Costs

Service Free tier Paid
Twilio Trial credit (~$15) ~$1.15/mo number + per-minute
Groq Free (rate-limited) Usage-based
Inworld TTS Free tier available ~$5/1M chars
Claude Code Included with Max plan Or API usage

For personal use with a few calls a day, the running cost is minimal beyond the Twilio number.

Troubleshooting

Twilio returns a 502 or "connection refused" Twilio can't reach your server. Verify nginx is running, your DNS points to the server, and the TLS certificate is valid. Test with curl -I https://your-server.example.com/health.

WebSocket closes immediately Check that nginx has WebSocket proxying enabled (the Upgrade and Connection headers in deploy/nginx.conf). Also check proxy_read_timeout — Twilio media streams are long-lived.

"Failed to load config" on startup The config file is missing or malformed. Run voice-echo --setup to generate it, or manually copy config.example.toml to ~/.voice-echo/config.toml.

Claude doesn't respond or times out Make sure the claude CLI is installed, in PATH, and authenticated. Run claude --version and claude "hello" manually to verify. If running as a systemd service, ensure the service user's PATH includes the Claude binary.

No audio / silence after speaking The VAD energy threshold may be too high for your microphone or phone quality. Lower vad.energy_threshold (try 30 or 20). Check RUST_LOG=voice_echo=debug for VAD activity logs.

TTS sounds robotic or uses the wrong voice Verify your inworld.voice_id is valid. Preview voices at the Inworld TTS Playground. You can also create custom voices in Inworld Studio.

Contributing

See CONTRIBUTING.md for branch naming, commit conventions, and workflow.

License

AGPL-3.0

Acknowledgments

Inspired by NetworkChuck's claude-phone. Rewritten from scratch in Rust with a different architecture -- no intermediate Node.js server, direct WebSocket pipeline, energy-based VAD, and an outbound call API for automation.

About

Voice interface for Claude Code over the phone via Twilio

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors