ComfyUI Extension: ComfyUI_Fill-ChatterBox
Voice Clone and TTS model.
Custom Nodes (3)
README
FL ChatterBox
High-quality text-to-speech nodes for ComfyUI powered by ResembleAI's Chatterbox models. Features voice cloning, multilingual synthesis, paralinguistic expressions, and voice conversion.

Features
- Zero-Shot Voice Cloning - Clone any voice from a few seconds of reference audio
- 3 TTS Models - Standard, Turbo (faster), and Multilingual variants
- 23 Languages - Arabic, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Norwegian, Polish, Portuguese, Russian, Spanish, Swahili, Swedish, Turkish
- Paralinguistic Tags - Express emotions with tags like
[laugh],[sigh],[gasp],[chuckle](Turbo model) - Voice Conversion - Transform one voice to sound like another
- Dialog Synthesis - Multi-speaker conversations with up to 4 voices
- Model Caching - Keep models loaded between runs for faster iteration
Nodes
| Node | Description | |------|-------------| | FL Chatterbox TTS | Standard high-quality text-to-speech with voice cloning | | FL Chatterbox Turbo TTS | Faster GPT2-based TTS with paralinguistic tag support | | FL Chatterbox Multilingual TTS | 23-language TTS with voice cloning | | FL Chatterbox VC | Voice conversion - transform source audio to target voice | | FL Chatterbox Dialog TTS | Multi-speaker dialog synthesis with up to 4 voices |
Installation
ComfyUI Manager
Search for "FL ChatterBox" and install.
Manual
cd ComfyUI/custom_nodes
git clone https://github.com/filliptm/ComfyUI_Fill-ChatterBox.git
cd ComfyUI_Fill-ChatterBox
pip install -r requirements.txt
Optional: Watermarking Support
pip install resemble-perth
Note: The resemble-perth package may have compatibility issues with Python 3.12+. Nodes will function without watermarking if import fails.
Quick Start
- Add FL Chatterbox TTS (or Turbo/Multilingual variant)
- Enter your text in the text field
- Optionally connect reference audio for voice cloning
- Set
keep_model_loaded = Truefor faster subsequent runs - Generate!
Turbo Model with Expressions
Hello there! [laugh] Isn't this amazing? [sigh] I just love text to speech.
Supported tags: [laugh], [sigh], [gasp], [chuckle], [cough], [sniff], [groan], [shush], [clear throat]
Models
| Model | Speed | Languages | Notes | |-------|-------|-----------|-------| | Standard | Normal | English | Highest quality | | Turbo | Fast | English | Paralinguistic tags, GPT2-based | | Multilingual | Normal | 23 languages | Cross-lingual voice cloning |
Models download automatically on first use to ComfyUI/models/chatterbox/.
Parameters
TTS Parameters
| Parameter | Range | Description |
|-----------|-------|-------------|
| exaggeration | 0.25-2.0 | Emotion intensity |
| cfg_weight | 0.2-1.0 | Pace/classifier-free guidance |
| temperature | 0.05-5.0 | Randomness in generation |
| seed | 0-4.29B | Reproducible generation |
| keep_model_loaded | bool | Cache model between runs |
Turbo Parameters
| Parameter | Range | Description |
|-----------|-------|-------------|
| temperature | 0.05-2.0 | Randomness in generation |
| top_k | 1-5000 | Top-k sampling |
| top_p | 0.1-1.0 | Nucleus sampling threshold |
| repetition_penalty | 1.0-3.0 | Token repetition penalty |
Limitations
- Maximum audio length: ~40 seconds per generation
- Reference audio: Minimum 5-6 seconds recommended
- Turbo paralinguistic tags: English only
Requirements
- Python 3.10+
- 8GB RAM minimum (16GB+ recommended)
- NVIDIA GPU with 8GB+ VRAM recommended
- CPU and Mac MPS supported
License
MIT License - See Chatterbox repo for model licenses.
Changelog
2025-12-28
- Added Turbo TTS node (faster, GPT2-based with paralinguistic tags)
- Added Multilingual TTS node (23 languages)
- Improved model caching using module-level globals
- Centralized model downloads to
ComfyUI/models/chatterbox/
2025-07-24
- Added Dialog TTS node for multi-speaker conversations (up to 4 speakers)
- Extended all nodes with seed parameters for reproducible generation
- Isolated audio track outputs per speaker
2025-06-24
- Added seed parameter for reproducible generation
- Made Perth watermarking optional for Python 3.12+ compatibility
2025-05-31
- Added persistent model loading and loading bar
- Added Mac MPS support
- Native inference code (removed chatterbox-tts library dependency)