ComfyUI Extension: ComfyUI_Fill-ChatterBox

Authored by filliptm

Created

Updated

191 stars

Voice Clone and TTS model.

README

FL ChatterBox

High-quality text-to-speech nodes for ComfyUI powered by ResembleAI's Chatterbox models. Features voice cloning, multilingual synthesis, paralinguistic expressions, and voice conversion.

Chatterbox Patreon

Workflow Preview

Features

  • Zero-Shot Voice Cloning - Clone any voice from a few seconds of reference audio
  • 3 TTS Models - Standard, Turbo (faster), and Multilingual variants
  • 23 Languages - Arabic, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Norwegian, Polish, Portuguese, Russian, Spanish, Swahili, Swedish, Turkish
  • Paralinguistic Tags - Express emotions with tags like [laugh], [sigh], [gasp], [chuckle] (Turbo model)
  • Voice Conversion - Transform one voice to sound like another
  • Dialog Synthesis - Multi-speaker conversations with up to 4 voices
  • Model Caching - Keep models loaded between runs for faster iteration

Nodes

| Node | Description | |------|-------------| | FL Chatterbox TTS | Standard high-quality text-to-speech with voice cloning | | FL Chatterbox Turbo TTS | Faster GPT2-based TTS with paralinguistic tag support | | FL Chatterbox Multilingual TTS | 23-language TTS with voice cloning | | FL Chatterbox VC | Voice conversion - transform source audio to target voice | | FL Chatterbox Dialog TTS | Multi-speaker dialog synthesis with up to 4 voices |

Installation

ComfyUI Manager

Search for "FL ChatterBox" and install.

Manual

cd ComfyUI/custom_nodes
git clone https://github.com/filliptm/ComfyUI_Fill-ChatterBox.git
cd ComfyUI_Fill-ChatterBox
pip install -r requirements.txt

Optional: Watermarking Support

pip install resemble-perth

Note: The resemble-perth package may have compatibility issues with Python 3.12+. Nodes will function without watermarking if import fails.

Quick Start

  1. Add FL Chatterbox TTS (or Turbo/Multilingual variant)
  2. Enter your text in the text field
  3. Optionally connect reference audio for voice cloning
  4. Set keep_model_loaded = True for faster subsequent runs
  5. Generate!

Turbo Model with Expressions

Hello there! [laugh] Isn't this amazing? [sigh] I just love text to speech.

Supported tags: [laugh], [sigh], [gasp], [chuckle], [cough], [sniff], [groan], [shush], [clear throat]

Models

| Model | Speed | Languages | Notes | |-------|-------|-----------|-------| | Standard | Normal | English | Highest quality | | Turbo | Fast | English | Paralinguistic tags, GPT2-based | | Multilingual | Normal | 23 languages | Cross-lingual voice cloning |

Models download automatically on first use to ComfyUI/models/chatterbox/.

Parameters

TTS Parameters

| Parameter | Range | Description | |-----------|-------|-------------| | exaggeration | 0.25-2.0 | Emotion intensity | | cfg_weight | 0.2-1.0 | Pace/classifier-free guidance | | temperature | 0.05-5.0 | Randomness in generation | | seed | 0-4.29B | Reproducible generation | | keep_model_loaded | bool | Cache model between runs |

Turbo Parameters

| Parameter | Range | Description | |-----------|-------|-------------| | temperature | 0.05-2.0 | Randomness in generation | | top_k | 1-5000 | Top-k sampling | | top_p | 0.1-1.0 | Nucleus sampling threshold | | repetition_penalty | 1.0-3.0 | Token repetition penalty |

Limitations

  • Maximum audio length: ~40 seconds per generation
  • Reference audio: Minimum 5-6 seconds recommended
  • Turbo paralinguistic tags: English only

Requirements

  • Python 3.10+
  • 8GB RAM minimum (16GB+ recommended)
  • NVIDIA GPU with 8GB+ VRAM recommended
  • CPU and Mac MPS supported

License

MIT License - See Chatterbox repo for model licenses.

Changelog

2025-12-28

  • Added Turbo TTS node (faster, GPT2-based with paralinguistic tags)
  • Added Multilingual TTS node (23 languages)
  • Improved model caching using module-level globals
  • Centralized model downloads to ComfyUI/models/chatterbox/

2025-07-24

  • Added Dialog TTS node for multi-speaker conversations (up to 4 speakers)
  • Extended all nodes with seed parameters for reproducible generation
  • Isolated audio track outputs per speaker

2025-06-24

  • Added seed parameter for reproducible generation
  • Made Perth watermarking optional for Python 3.12+ compatibility

2025-05-31

  • Added persistent model loading and loading bar
  • Added Mac MPS support
  • Native inference code (removed chatterbox-tts library dependency)