A ComfyUI custom node that brings Zonos Text-to-Speech capabilities to your workflows, featuring high-quality speech synthesis and voice cloning.
A ComfyUI custom node that brings Zonos Text-to-Speech capabilities to your workflows, featuring high-quality speech synthesis and voice cloning.
cd ComfyUI/custom_nodes/
git clone https://github.com/BahaC/ComfyUI-ZonosTTS.git
cd ComfyUI-ZonosTTS
pip install -r requirements.txt
The node provides a simple interface for text-to-speech conversion with advanced options:
text
: Input text to synthesize (String)language
: Language code selection (en-us, ja-jp)model_name
: Choice of model architecture:
Zyphra/Zonos-v0.1-transformer
: Faster, lighter modelZyphra/Zonos-v0.1-hybrid
: Higher quality (requires additional dependencies)audio_file
: Reference audio for voice cloning (optional)cfg_scale
: Control over generation quality (1.0 - 10.0)audio_path
: Path to the generated WAV fileModels are automatically downloaded and cached in:
/workspace/ComfyUI/models/TTS/Zonos/
The node implements smart model caching:
[Text Input] -> [Zonos TTS] -> [Audio Output]
[Text Input] -> [Zonos TTS] <- [Audio File] == [Audio File]
Generated audio files are saved with unique timestamps:
output/zonos_YYYYMMDD-HHMMSS_UUID.wav
Transformer Model
Hybrid Model
Model Download Fails
Voice Cloning Issues
CUDA Out of Memory
This project is licensed under the terms of the LICENSE file included in the repository.