GitHub - BestSithInEU/cc-vox: Claude Code plugin that speaks a short summary aloud after every response

Give Claude Code a voice.
Hear spoken summaries after every response — zero friction, multiple TTS backends.

📖 Table of Contents

✨ Features
🔁 How It Works
� Demo
�🎙️ Backends
🚀 Quick Start
💬 Usage
- Slash Commands
- Voices
⚙️ Configuration
- Environment Variables
🏗️ Architecture
🔀 Sequence Diagram
📁 Project Structure
🤔 Why cc-vox?
🔧 Troubleshooting
❓ FAQ
🛠️ Development
🤝 Contributing
🙏 Credits

✨ Features

🔊 Automatic voice feedback — Claude speaks a summary after every response
🎯 Multi-backend TTS — Qwen3-TTS, Fish Speech, Chatterbox (GPU), Kokoro (CPU), pocket-tts (zero setup)
🔄 Auto-detection — Picks the best available backend automatically
🎛️ Slash commands — Control voice, backend, personality on the fly
🗣️ 9 voices — Cross-backend voice mapping between Kokoro and pocket-tts
⚡ Zero config fallback — pocket-tts auto-starts via uvx, nothing to install
🧠 Smart GPU awareness — Skips GPU backends when your GPU is busy
🎭 Voice personality — Set prompts like "be chill" or "be upbeat"

🔁 How It Works

Note

The entire pipeline is hands-free. Once installed, Claude automatically includes voice summaries — no prompting required.

� Demo

┌─────────────────────────────────────────────────────────────────┐
│ $ claude                                                        │
│                                                                 │
│ You: refactor the auth module to use JWT tokens                 │
│                                                                 │
│ Claude: I've refactored the authentication module...            │
│ [... full response ...]                                         │
│                                                                 │
│ 📢 Done! I refactored auth to use JWT. Changed 3 files:        │
│    auth.py, middleware.py, and config.py. All tests pass.       │
│                                                                 │
│ 🔊 ████████████████████░░░░ Speaking...                        │
└─────────────────────────────────────────────────────────────────┘

The 📢 summary is extracted by the stop hook and spoken aloud through your chosen TTS backend.

🎙️ Backends

In auto mode (default), cc-vox tries Qwen3-TTS → Fish Speech → Chatterbox → Kokoro → pocket-tts and uses the first available. GPU backends are skipped when GPU utilization exceeds the threshold (default 80%).

🚀 Quick Start

1. Install the plugin

claude plugin marketplace add BestSithInEU/cc-vox
claude plugin install voice

2. Pick a backend

Option A: Zero setup — pocket-tts auto-starts via uvx, nothing to install

[!TIP] Just use Claude Code — pocket-tts will auto-download and start on first speech. No Docker, no GPU needed.

Optionally pre-download the model:

hf download kyutai/pocket-tts

Option B: Kokoro ⭐ recommended — great quality, CPU-only Docker

docker run -d --name kokoro \
  -p 32612:8880 \
  ghcr.io/remsky/kokoro-fastapi-cpu:latest

[!TIP] Kokoro offers the best balance of quality and simplicity. One command, CPU-only, great results.

Option C: Qwen3-TTS ⭐ — best quality, voice cloning, requires NVIDIA GPU

# Clone the server and start via Docker Compose
cd tools/tts && git clone https://github.com/ValyrianTech/Qwen3-TTS_server qwen3-tts
docker compose -f tts/docker-compose.yml --profile gpu up -d qwen3-tts

Supports voice cloning — upload a reference audio clip to create custom voices:

curl -X POST http://localhost:32614/upload_audio/ \
  -F "audio_file_label=my_voice" \
  -F "file=@reference.wav"

[!IMPORTANT] Requires an NVIDIA GPU with 8GB+ VRAM. Supports 10 languages.

Option D: Fish Speech — high quality, requires NVIDIA GPU

# Download the model (0.5B params, 13 languages)
hf download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini

# Start the container
docker run -d --name fish-speech \
  --gpus all \
  -p 32611:7860 \
  -v ./checkpoints:/app/checkpoints \
  fishaudio/fish-speech:latest

[!IMPORTANT] Requires an NVIDIA GPU with Docker GPU support configured. The openaudio-s1-mini model is licensed CC-BY-NC-SA-4.0.

Option E: Chatterbox — voice cloning, requires NVIDIA GPU

docker run -d --name chatterbox \
  --gpus all \
  -p 32613:4123 \
  travisvn/chatterbox-tts-api:latest

[!IMPORTANT] Requires an NVIDIA GPU with 4-8GB VRAM. OpenAI-compatible API.

3. Start using Claude

claude  # Voice feedback is automatic!

💬 Usage

Voice feedback is automatic. Claude speaks a summary after each response.

Slash Commands

/voice:speak                  Enable voice
/voice:speak stop             Disable voice
/voice:speak af_bella         Change voice
/voice:speak prompt be chill  Set voice personality
/voice:speak prompt           Clear personality
/voice:speak backend kokoro   Force backend
/voice:speak backend auto     Auto-detect (default)
/voice:speak speed 1.3        Adjust speech speed (kokoro)
/voice:speak max_sentences 4  Longer summaries
/voice:speak fallback on      Try other backends if forced one is down

Voices

Voice names work across all backends — cc-vox auto-maps between Kokoro and pocket-tts names.

Kokoro name	pocket-tts alias	Gender	Accent
`af_heart` ★	`alba`	F	American
`af_bella`	`azure`	F	American
`af_nicole`	`fantine`	F	American
`af_sarah`	`cosette`	F	American
`af_sky`	`eponine`	F	American
`am_adam`	`marius`	M	American
`am_michael`	`jean`	M	American
`bf_emma`	`azelma`	F	British
`bm_george`	—	M	British

_{★ default voice}

⚙️ Configuration

~/.claude/cc-vox.toml

[core]
enabled = true
voice = "af_heart"       # see voices below
backend = "auto"         # auto | kokoro | fish-speech | pocket-tts | chatterbox | qwen3-tts

[tuning]
speed = 1.0              # 0.5-2.0 (kokoro only)
max_sentences = 2        # max sentences in spoken summary (1-10)
fallback = true          # try other backends when forced one is down

[style]
prompt = "be upbeat and encouraging"

Settings

Setting	Default	Description
`tuning.speed`	`1.0`	Speech speed 0.5–2.0 (kokoro only)
`tuning.max_sentences`	`2`	Max sentences in spoken summary (1–10)
`tuning.fallback`	`true`	Try other backends when forced one is down

Environment Variables

Variable	Default	Description
`TTS_BACKEND`	`auto`	Override backend: `auto` `qwen3-tts` `fish-speech` `chatterbox` `kokoro` `pocket-tts`
`KOKORO_PORT`	`32612`	Kokoro Docker port
`FISH_SPEECH_PORT`	`32611`	Fish Speech Docker port
`CHATTERBOX_PORT`	`32613`	Chatterbox Docker port
`QWEN3_TTS_PORT`	`32614`	Qwen3-TTS Docker port
`TTS_PORT`	`8000`	pocket-tts port
`GPU_THRESHOLD`	`80`	GPU % above which Fish Speech is skipped

🏗️ Architecture

🔀 Sequence Diagram

📁 Project Structure

cc-vox/
├── hooks/                              # Claude Code hook scripts
│   ├── hooks.json                      # Hook registration manifest
│   ├── user_prompt_submit_hook.py      # ① Injects 📢 reminder at turn start
│   ├── post_tool_use_hook.py           # ② Brief nudge after tool calls
│   ├── stop_hook.py                    # ③ Extracts summary → calls say
│   ├── voice_common.py                 # Config parsing (TOML) & reminders
│   ├── session.py                      # Session JSONL file I/O
│   ├── summarize.py                    # Headless Claude fallback
│   └── tts/                            # TTS backend package
│       ├── __init__.py                 # Registry + select_backend()
│       ├── _protocol.py                # TTSBackend Protocol
│       ├── voices.py                   # Voice catalog (single source of truth)
│       ├── kokoro.py                   # Kokoro backend
│       ├── fish_speech.py              # Fish Speech backend
│       ├── chatterbox.py              # Chatterbox backend
│       ├── qwen3_tts.py              # Qwen3-TTS backend
│       ├── pocket_tts.py              # pocket-tts backend
│       ├── _playback.py                # Audio playback + locking
│       └── _session_state.py           # Session sentinel files
├── commands/
│   └── speak.md                        # /voice:speak slash command definition
├── scripts/
│   └── say                             # Thin TTS CLI (uses tts package)
├── docs/                               # Zensical documentation
├── assets/                             # SVG diagrams & logos
│   ├── logo-dark.svg                   # Animated logo (dark mode)
│   ├── logo-light.svg                  # Animated logo (light mode)
│   ├── flow.svg                        # Pipeline flow diagram
│   ├── architecture.svg                # Component architecture diagram
│   ├── backends.svg                    # Backend comparison cards
│   └── sequence.svg                    # Sequence diagram
├── .claude-plugin/
│   ├── plugin.json                     # v2.0.0 plugin manifest
│   └── marketplace.json                # Distribution metadata
├── zensical.toml                       # Documentation config
├── LICENSE                             # MIT
└── README.md

🤔 Why cc-vox?

	cc-vox	Manual TTS	No voice
Automatic speech after every response	✅	❌ manual	❌
Multiple TTS backends	✅ 5 backends	⚠️ 1 at a time	—
Auto-detects best backend	✅	❌	—
Zero-setup option	✅ pocket-tts	❌	—
GPU-aware routing	✅	❌	—
Voice personality prompts	✅	❌	—
Cross-backend voice mapping	✅	❌	—
Slash command control	✅	❌	—
Setup time	~2 min	30+ min	0 min

🔧 Troubleshooting

No audio output

Check that voice is enabled: run /voice:speak in Claude Code

Verify your TTS backend is running:

# Kokoro
curl http://localhost:32612/v1/audio/speech -X POST -d '{}' 2>/dev/null && echo "OK" || echo "Not running"

# Fish Speech
curl http://localhost:32611 2>/dev/null && echo "OK" || echo "Not running"

Check system audio output device
Try forcing a backend: /voice:speak backend pocket-tts

Docker container won't start

# Check if port is already in use
lsof -i :32612  # Kokoro
lsof -i :32611  # Fish Speech

# Check Docker logs
docker logs kokoro
docker logs fish-speech

Fish Speech skipped (GPU threshold)

cc-vox checks GPU utilization before using Fish Speech. If your GPU is busy (default >80%), it falls back to Kokoro or pocket-tts.

# Check current GPU usage
nvidia-smi

# Raise the threshold
export GPU_THRESHOLD=95

Voice sounds wrong or uses wrong backend

# Force a specific backend
/voice:speak backend kokoro

# Check which backend is being used (verbose mode)
TTS_BACKEND=kokoro ./scripts/say "Testing Kokoro directly"

❓ FAQ

Does it work offline?

Yes — if you run Kokoro or Fish Speech locally via Docker, everything stays on your machine. pocket-tts also runs locally via uvx.

Can I add custom voices?

The voice list is currently fixed to the 9 voices that map cleanly across backends. Custom voice support depends on the backend you're using — Fish Speech supports voice cloning natively.

Does it slow down Claude?

No. TTS runs asynchronously after Claude finishes responding. The only overhead is a small system prompt injection (~50 tokens) to remind Claude to include a voice summary. With fallback = true (default), if your forced backend goes down, cc-vox transparently tries the next available backend.

Can I use it with other AI coding tools?

cc-vox is built specifically for Claude Code's hook system. The say script can be used standalone, but the automatic hook integration is Claude Code-specific.

How do I uninstall?

claude plugin uninstall voice
# Optionally remove Docker containers
docker rm -f kokoro fish-speech

🛠️ Development

# Run with local plugin directory
claude --plugin-dir ~/Documents/Projects/cc-vox

# Test say script directly
./scripts/say --voice af_heart "Hello, testing voice output"

# Force a specific backend
TTS_BACKEND=kokoro ./scripts/say "Testing Kokoro"

# Test with custom speed
./scripts/say --voice af_heart --speed 1.3 "Testing faster speech"

🤝 Contributing

Contributions are welcome! Here's how to get started:

Fork the repository

Clone your fork and set up the development environment:

git clone https://github.com/<your-username>/cc-vox.git
cd cc-vox
claude --plugin-dir .

Make your changes — follow the existing code style
Test with at least one TTS backend running
Submit a PR with a clear description of your changes

Note

Adding a new backend = create one file in hooks/tts/ + one registry line in __init__.py. See the Adding a Backend guide.

🙏 Credits

Based on the original voice plugin by pchalasani, which pioneered the hook-based voice feedback architecture for Claude Code. cc-vox extends it with multi-backend TTS support and auto-detection.

_{MIT License · Made with 🔊 by BestSithInEU}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.claude-plugin		.claude-plugin
.github		.github
assets		assets
docs		docs
plugins		plugins
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmessage		.gitmessage
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
zensical.toml		zensical.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✨ Features

🔁 How It Works

� Demo

🎙️ Backends

🚀 Quick Start

1. Install the plugin

2. Pick a backend

3. Start using Claude

💬 Usage

Slash Commands

Voices

⚙️ Configuration

Settings

Environment Variables

🏗️ Architecture

🔀 Sequence Diagram

📁 Project Structure

🤔 Why cc-vox?

🔧 Troubleshooting

❓ FAQ

🛠️ Development

🤝 Contributing

🙏 Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

✨ Features

🔁 How It Works

� Demo

🎙️ Backends

🚀 Quick Start

1. Install the plugin

2. Pick a backend

3. Start using Claude

💬 Usage

Slash Commands

Voices

⚙️ Configuration

Settings

Environment Variables

🏗️ Architecture

🔀 Sequence Diagram

📁 Project Structure

🤔 Why cc-vox?

🔧 Troubleshooting

❓ FAQ

🛠️ Development

🤝 Contributing

🙏 Credits

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages