ComfyUI Extension: ComfyUI_ChatterBox
An unofficial ComfyUI custom node integration for a/Resemble AI's ChatterBox - a state-of-the-art open-source Text-to-Speech (TTS) model with voice cloning capabilities.
Custom Nodes (0)
README
ComfyUI_ChatterBox_Voice
An unofficial ComfyUI custom node integration for High-quality Text-to-Speech and Voice Conversion nodes for ComfyUI using ResembleAI's ChatterboxTTS.
NEW: Audio capure node
Features
š¤ ChatterBox TTS - Generate speech from text with optional voice cloning
š ChatterBox VC - Convert voice from one speaker to another
šļø ChatterBox Voice Capture - Record voice input with smart silence detection
ā” Fast & Quality - Production-grade TTS that outperforms ElevenLabs
š Emotion Control - Unique exaggeration parameter for expressive speech
Note: There are multiple ChatterBox extensions available. This implementation focuses on simplicity and ComfyUI standards.
Installation
1. Install the Extension
cd ComfyUI/custom_nodes
git clone https://github.com/ShmuelRonen/ComfyUI_ChatterBox.git
2. Install ChatterboxTTS Package
Copy the included package folders to your Python site-packages:
Windows Portable ComfyUI:
cd D:\ComfyUI_windows\ComfyUI\custom_nodes\ComfyUI_ChatterBox
xcopy "put_contain_in_site_packages_folder\*" "..\..\..\python_embeded\Lib\site-packages\" /E /S
WSL/Linux ComfyUI:
cd ComfyUI/custom_nodes/ComfyUI_ChatterBox
cp -r put_contain_in_site_packages_folder/* ../../venv/lib/python3.11/site-packages/
Other Python setups:
# Find your site-packages location first:
python -c "import site; print(site.getsitepackages())"
# Then copy both folders:
cp -r put_contain_in_site_packages_folder/* /path/to/your/site-packages/
This copies both required folders:
chatterbox/
- The actual TTS package codechatterbox_tts-0.1.1.dist-info/
- Package metadata for Python
3. Install Additional Dependencies
pip install -r requirements.txt
Note: torch
, torchaudio
, numpy
should already be available in ComfyUI.
Additional dependencies for voice recording:
pip install sounddevice
4. Download Models
Download the ChatterboxTTS models and place them in:
ComfyUI/models/TTS/chatterbox/
Required files:
conds.pt
(105 KB)s3gen.pt
(~1 GB)t3_cfg.pt
(~1 GB)tokenizer.json
(25 KB)ve.pt
(5.5 MB)
Download from: https://huggingface.co/ResembleAI/chatterbox/tree/main
Manual download steps:
- Visit https://huggingface.co/ResembleAI/chatterbox/tree/main
- Click each required file and download
- Save all files to
ComfyUI/models/TTS/chatterbox/
- Folder should contain exactly 5 files as listed above
5. Restart ComfyUI
The ChatterBox nodes will appear in the "ChatterBox" category.
Usage
Voice Recording (New!)
- Add "š¤ ChatterBox Voice Capture" node
- Select your microphone from the dropdown
- Adjust recording settings:
- Silence Threshold: How quiet to consider "silence" (0.001-0.1)
- Silence Duration: How long to wait before stopping (0.5-5.0 seconds)
- Sample Rate: Audio quality (8000-96000 Hz, default 44100)
- Change the Trigger value to start a new recording
- Connect output to TTS (for voice cloning) or VC nodes
Smart Recording Features:
- š Auto-stop: Automatically stops when you finish speaking
- šÆ Noise filtering: Configurable silence detection
- š Trigger-based: Change trigger number to record again
- š Temp files: Automatically manages temporary audio files
Text-to-Speech
- Add "ChatterBox Text-to-Speech" node
- Enter your text
- Optionally connect reference audio for voice cloning
- Adjust settings:
- Exaggeration: Emotion intensity (0.25-2.0)
- Temperature: Randomness (0.05-5.0)
- CFG Weight: Guidance strength (0.0-1.0)
Voice Conversion
- Add "ChatterBox Voice Conversion" node
- Connect source audio (voice to convert)
- Connect target audio (voice style to copy)
Workflow Examples
Voice Cloning Workflow:
š¤ Voice Capture ā ChatterBox TTS (reference_audio)
Voice Conversion Workflow:
š¤ Voice Capture (source) ā ChatterBox VC ā š¤ Voice Capture (target)
Complete Pipeline:
š¤ Voice Capture ā ChatterBox TTS ā PreviewAudio
ā ChatterBox VC ā š¤ Target Voice
Settings Guide
Voice Recording Settings
General Recording:
silence_threshold=0.01
,silence_duration=2.0
(default settings)
Noisy Environment:
- Higher
silence_threshold
(~0.05) to ignore background noise - Longer
silence_duration
(~3.0) to avoid cutting off speech
Quiet Environment:
- Lower
silence_threshold
(~0.005) for sensitive detection - Shorter
silence_duration
(~1.0) for quick stopping
TTS Settings
General Use:
exaggeration=0.5
,cfg_weight=0.5
(default settings work well)
Expressive Speech:
- Lower
cfg_weight
(~0.3) + higherexaggeration
(~0.7) - Higher exaggeration speeds up speech; lower CFG slows it down
ChatterBox TTS Text Limits
š No Official Hard Limit: Unlike some TTS systems (like OpenAI's TTS which has a 4096 character limit TTS model has a "hidden" 4096 characters limit - API - OpenAI Developer Community), ChatterBox TTS doesn't appear to have a documented hard character or word limit.
š§ Practical Implementation: However, for optimal performance, the underlying model likely works best with shorter text segments.
Installation Summary
- Clone extension ā
git clone https://github.com/your-username/ComfyUI_ChatterBox.git
- Copy package ā Copy folders from
put_contain_in_site_packages_folder/
to site-packages - Install audio deps ā
pip install sounddevice
(for voice recording) - Download models ā Get 5 files from HuggingFace to
ComfyUI/models/TTS/chatterbox/
- Restart ComfyUI ā Nodes appear in "ChatterBox" category
Why This Approach?
- No pip conflicts - Avoids dependency issues with ComfyUI
- Universal - Works on Windows portable, WSL, Linux, conda, etc.
- Offline - No downloads during installation
- Simple - Just copy folders, no complex scripts
Why Two Folders?
chatterbox/
- Contains the actual Python code for the TTS engine
chatterbox_tts-0.1.1.dist-info/
- Contains package metadata (version, dependencies, etc.)
Python's import system needs both folders to properly recognize and load the package. Missing either folder can cause import errors or version conflicts.
Troubleshooting
General Issues
"ChatterboxTTS not available" ā Copy the package folders:
# Check if both folders exist in your site-packages:
# chatterbox/
# chatterbox_tts-0.1.1.dist-info/
"No module named 'chatterbox'" ā Verify both folders copied correctly:
# Windows Portable
dir "python_embeded\Lib\site-packages\chatterbox"
dir "python_embeded\Lib\site-packages\chatterbox_tts-0.1.1.dist-info"
# WSL/Linux
ls venv/lib/python3.11/site-packages/chatterbox
ls venv/lib/python3.11/site-packages/chatterbox_tts-0.1.1.dist-info
Voice Recording Issues
"No input devices found" ā Install audio drivers and restart ComfyUI:
# Check if sounddevice can detect your microphone:
python -c "import sounddevice as sd; print(sd.query_devices())"
"Permission denied" (Linux/Mac) ā Give microphone access:
# Linux: Install ALSA/PulseAudio dev packages
sudo apt-get install libasound2-dev portaudio19-dev
# Mac: Grant microphone permission in System Preferences
Recording not working ā Check microphone settings:
- Try different microphones in the dropdown
- Adjust silence threshold if auto-stop isn't working
- Check system microphone permissions
- Restart ComfyUI after changing audio drivers
Duplicate microphones in list ā This is normal - Windows shows the same device through multiple audio drivers
Model Issues
Models not found ā Download manually to ComfyUI/models/TTS/chatterbox/
Wrong Python version ā Make sure you're copying to the same Python environment that ComfyUI uses
Permission errors ā Run terminal as administrator (Windows) or use sudo
(Linux)
License
MIT License - Same as ChatterboxTTS
Credits
- ResembleAI for ChatterboxTTS
- ComfyUI team for the amazing framework
- sounddevice library for audio recording functionality
š Links
- Resemble AI ChatterBox
- Model Downloads (Hugging Face) ā¬ ļø Download models here
- ChatterBox Demo
- ComfyUI
- Resemble AI Official Site
Note: The original ChatterBox model includes Resemble AI's Perth watermarking system for responsible AI usage. This ComfyUI integration includes the Perth dependency but has watermarking disabled by default to ensure maximum compatibility. Users can re-enable watermarking by modifying the code if needed, while maintaining the full quality and capabilities of the underlying TTS model.