Give Claude Code a voice.
Hear spoken summaries after every response — zero friction, multiple TTS backends.
📖 Table of Contents
- 🔊 Automatic voice feedback — Claude speaks a summary after every response
- 🎯 Multi-backend TTS — Qwen3-TTS, Fish Speech, Chatterbox (GPU), Kokoro (CPU), pocket-tts (zero setup)
- 🔄 Auto-detection — Picks the best available backend automatically
- 🎛️ Slash commands — Control voice, backend, personality on the fly
- 🗣️ 9 voices — Cross-backend voice mapping between Kokoro and pocket-tts
- ⚡ Zero config fallback — pocket-tts auto-starts via
uvx, nothing to install - 🧠 Smart GPU awareness — Skips GPU backends when your GPU is busy
- 🎭 Voice personality — Set prompts like "be chill" or "be upbeat"
Note
The entire pipeline is hands-free. Once installed, Claude automatically includes voice summaries — no prompting required.
┌─────────────────────────────────────────────────────────────────┐
│ $ claude │
│ │
│ You: refactor the auth module to use JWT tokens │
│ │
│ Claude: I've refactored the authentication module... │
│ [... full response ...] │
│ │
│ 📢 Done! I refactored auth to use JWT. Changed 3 files: │
│ auth.py, middleware.py, and config.py. All tests pass. │
│ │
│ 🔊 ████████████████████░░░░ Speaking... │
└─────────────────────────────────────────────────────────────────┘
The
📢summary is extracted by the stop hook and spoken aloud through your chosen TTS backend.
In auto mode (default), cc-vox tries Qwen3-TTS → Fish Speech → Chatterbox → Kokoro → pocket-tts and uses the first available. GPU backends are skipped when GPU utilization exceeds the threshold (default 80%).
claude plugin marketplace add BestSithInEU/cc-vox
claude plugin install voiceOption A: Zero setup — pocket-tts auto-starts via uvx, nothing to install
[!TIP] Just use Claude Code — pocket-tts will auto-download and start on first speech. No Docker, no GPU needed.
Optionally pre-download the model:
hf download kyutai/pocket-ttsOption B: Kokoro ⭐ recommended — great quality, CPU-only Docker
docker run -d --name kokoro \
-p 32612:8880 \
ghcr.io/remsky/kokoro-fastapi-cpu:latest[!TIP] Kokoro offers the best balance of quality and simplicity. One command, CPU-only, great results.
Option C: Qwen3-TTS ⭐ — best quality, voice cloning, requires NVIDIA GPU
# Clone the server and start via Docker Compose
cd tools/tts && git clone https://github.com/ValyrianTech/Qwen3-TTS_server qwen3-tts
docker compose -f tts/docker-compose.yml --profile gpu up -d qwen3-ttsSupports voice cloning — upload a reference audio clip to create custom voices:
curl -X POST http://localhost:32614/upload_audio/ \
-F "audio_file_label=my_voice" \
-F "file=@reference.wav"[!IMPORTANT] Requires an NVIDIA GPU with 8GB+ VRAM. Supports 10 languages.
Option D: Fish Speech — high quality, requires NVIDIA GPU
# Download the model (0.5B params, 13 languages)
hf download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
# Start the container
docker run -d --name fish-speech \
--gpus all \
-p 32611:7860 \
-v ./checkpoints:/app/checkpoints \
fishaudio/fish-speech:latest[!IMPORTANT] Requires an NVIDIA GPU with Docker GPU support configured. The openaudio-s1-mini model is licensed CC-BY-NC-SA-4.0.
Option E: Chatterbox — voice cloning, requires NVIDIA GPU
docker run -d --name chatterbox \
--gpus all \
-p 32613:4123 \
travisvn/chatterbox-tts-api:latest[!IMPORTANT] Requires an NVIDIA GPU with 4-8GB VRAM. OpenAI-compatible API.
claude # Voice feedback is automatic!Voice feedback is automatic. Claude speaks a summary after each response.
/voice:speak Enable voice
/voice:speak stop Disable voice
/voice:speak af_bella Change voice
/voice:speak prompt be chill Set voice personality
/voice:speak prompt Clear personality
/voice:speak backend kokoro Force backend
/voice:speak backend auto Auto-detect (default)
/voice:speak speed 1.3 Adjust speech speed (kokoro)
/voice:speak max_sentences 4 Longer summaries
/voice:speak fallback on Try other backends if forced one is down
Voice names work across all backends — cc-vox auto-maps between Kokoro and pocket-tts names.
| Kokoro name | pocket-tts alias | Gender | Accent |
|---|---|---|---|
af_heart ★ |
alba |
F | American |
af_bella |
azure |
F | American |
af_nicole |
fantine |
F | American |
af_sarah |
cosette |
F | American |
af_sky |
eponine |
F | American |
am_adam |
marius |
M | American |
am_michael |
jean |
M | American |
bf_emma |
azelma |
F | British |
bm_george |
— | M | British |
★ default voice
~/.claude/cc-vox.toml
[core]
enabled = true
voice = "af_heart" # see voices below
backend = "auto" # auto | kokoro | fish-speech | pocket-tts | chatterbox | qwen3-tts
[tuning]
speed = 1.0 # 0.5-2.0 (kokoro only)
max_sentences = 2 # max sentences in spoken summary (1-10)
fallback = true # try other backends when forced one is down
[style]
prompt = "be upbeat and encouraging"| Setting | Default | Description |
|---|---|---|
tuning.speed |
1.0 |
Speech speed 0.5–2.0 (kokoro only) |
tuning.max_sentences |
2 |
Max sentences in spoken summary (1–10) |
tuning.fallback |
true |
Try other backends when forced one is down |
| Variable | Default | Description |
|---|---|---|
TTS_BACKEND |
auto |
Override backend: auto qwen3-tts fish-speech chatterbox kokoro pocket-tts |
KOKORO_PORT |
32612 |
Kokoro Docker port |
FISH_SPEECH_PORT |
32611 |
Fish Speech Docker port |
CHATTERBOX_PORT |
32613 |
Chatterbox Docker port |
QWEN3_TTS_PORT |
32614 |
Qwen3-TTS Docker port |
TTS_PORT |
8000 |
pocket-tts port |
GPU_THRESHOLD |
80 |
GPU % above which Fish Speech is skipped |
cc-vox/
├── hooks/ # Claude Code hook scripts
│ ├── hooks.json # Hook registration manifest
│ ├── user_prompt_submit_hook.py # ① Injects 📢 reminder at turn start
│ ├── post_tool_use_hook.py # ② Brief nudge after tool calls
│ ├── stop_hook.py # ③ Extracts summary → calls say
│ ├── voice_common.py # Config parsing (TOML) & reminders
│ ├── session.py # Session JSONL file I/O
│ ├── summarize.py # Headless Claude fallback
│ └── tts/ # TTS backend package
│ ├── __init__.py # Registry + select_backend()
│ ├── _protocol.py # TTSBackend Protocol
│ ├── voices.py # Voice catalog (single source of truth)
│ ├── kokoro.py # Kokoro backend
│ ├── fish_speech.py # Fish Speech backend
│ ├── chatterbox.py # Chatterbox backend
│ ├── qwen3_tts.py # Qwen3-TTS backend
│ ├── pocket_tts.py # pocket-tts backend
│ ├── _playback.py # Audio playback + locking
│ └── _session_state.py # Session sentinel files
├── commands/
│ └── speak.md # /voice:speak slash command definition
├── scripts/
│ └── say # Thin TTS CLI (uses tts package)
├── docs/ # Zensical documentation
├── assets/ # SVG diagrams & logos
│ ├── logo-dark.svg # Animated logo (dark mode)
│ ├── logo-light.svg # Animated logo (light mode)
│ ├── flow.svg # Pipeline flow diagram
│ ├── architecture.svg # Component architecture diagram
│ ├── backends.svg # Backend comparison cards
│ └── sequence.svg # Sequence diagram
├── .claude-plugin/
│ ├── plugin.json # v2.0.0 plugin manifest
│ └── marketplace.json # Distribution metadata
├── zensical.toml # Documentation config
├── LICENSE # MIT
└── README.md
| cc-vox | Manual TTS | No voice | |
|---|---|---|---|
| Automatic speech after every response | ✅ | ❌ manual | ❌ |
| Multiple TTS backends | ✅ 5 backends | — | |
| Auto-detects best backend | ✅ | ❌ | — |
| Zero-setup option | ✅ pocket-tts | ❌ | — |
| GPU-aware routing | ✅ | ❌ | — |
| Voice personality prompts | ✅ | ❌ | — |
| Cross-backend voice mapping | ✅ | ❌ | — |
| Slash command control | ✅ | ❌ | — |
| Setup time | ~2 min | 30+ min | 0 min |
No audio output
- Check that voice is enabled: run
/voice:speakin Claude Code - Verify your TTS backend is running:
# Kokoro curl http://localhost:32612/v1/audio/speech -X POST -d '{}' 2>/dev/null && echo "OK" || echo "Not running" # Fish Speech curl http://localhost:32611 2>/dev/null && echo "OK" || echo "Not running"
- Check system audio output device
- Try forcing a backend:
/voice:speak backend pocket-tts
Docker container won't start
# Check if port is already in use
lsof -i :32612 # Kokoro
lsof -i :32611 # Fish Speech
# Check Docker logs
docker logs kokoro
docker logs fish-speechFish Speech skipped (GPU threshold)
cc-vox checks GPU utilization before using Fish Speech. If your GPU is busy (default >80%), it falls back to Kokoro or pocket-tts.
# Check current GPU usage
nvidia-smi
# Raise the threshold
export GPU_THRESHOLD=95Voice sounds wrong or uses wrong backend
# Force a specific backend
/voice:speak backend kokoro
# Check which backend is being used (verbose mode)
TTS_BACKEND=kokoro ./scripts/say "Testing Kokoro directly"Does it work offline?
Yes — if you run Kokoro or Fish Speech locally via Docker, everything stays on your machine. pocket-tts also runs locally via uvx.
Can I add custom voices?
The voice list is currently fixed to the 9 voices that map cleanly across backends. Custom voice support depends on the backend you're using — Fish Speech supports voice cloning natively.
Does it slow down Claude?
No. TTS runs asynchronously after Claude finishes responding. The only overhead is a small system prompt injection (~50 tokens) to remind Claude to include a voice summary. With fallback = true (default), if your forced backend goes down, cc-vox transparently tries the next available backend.
Can I use it with other AI coding tools?
cc-vox is built specifically for Claude Code's hook system. The say script can be used standalone, but the automatic hook integration is Claude Code-specific.
How do I uninstall?
claude plugin uninstall voice
# Optionally remove Docker containers
docker rm -f kokoro fish-speech# Run with local plugin directory
claude --plugin-dir ~/Documents/Projects/cc-vox
# Test say script directly
./scripts/say --voice af_heart "Hello, testing voice output"
# Force a specific backend
TTS_BACKEND=kokoro ./scripts/say "Testing Kokoro"
# Test with custom speed
./scripts/say --voice af_heart --speed 1.3 "Testing faster speech"Contributions are welcome! Here's how to get started:
- Fork the repository
- Clone your fork and set up the development environment:
git clone https://github.com/<your-username>/cc-vox.git cd cc-vox claude --plugin-dir .
- Make your changes — follow the existing code style
- Test with at least one TTS backend running
- Submit a PR with a clear description of your changes
Note
Adding a new backend = create one file in hooks/tts/ + one registry line in __init__.py. See the Adding a Backend guide.
Based on the original voice plugin by pchalasani, which pioneered the hook-based voice feedback architecture for Claude Code. cc-vox extends it with multi-backend TTS support and auto-detection.
MIT License · Made with 🔊 by BestSithInEU