ComfyUI Extension: ComfyUI_FL-CosyVoice3
FL CosyVoice3 - Advanced Text-to-Speech nodes for ComfyUI. Features zero-shot voice cloning, cross-lingual synthesis, instruction-based control, and voice conversion using the CosyVoice3 model family. Supports 9 languages and 18+ Chinese dialects with automatic model downloading and caching.
Custom Nodes (0)
README
FL CosyVoice3
Advanced text-to-speech nodes for ComfyUI powered by the CosyVoice3 model family. Features zero-shot voice cloning, cross-lingual synthesis, and voice conversion.

Features
- Zero-Shot Voice Cloning - Clone any voice from 3-30 seconds of reference audio
- Cross-Lingual Synthesis - Speak different languages while preserving voice characteristics
- Voice Conversion - Transform one voice to sound like another
- 9 Languages - Chinese, English, Japanese, Korean, German, Spanish, French, Italian, Russian
- Auto Transcription - Built-in Whisper integration for reference audio
- Speed Control - Adjustable speech rate (0.5x - 2.0x)
Nodes
| Node | Description | |------|-------------| | Model Loader | Downloads and caches CosyVoice models | | Zero-Shot Clone | Clone voices from reference audio | | Cross-Lingual | Generate speech in different languages | | Voice Conversion | Convert source audio to target voice | | Dialog | Multi-speaker dialog synthesis with up to 4 voices | | Audio Crop | Trim audio to specific time ranges |
Installation
ComfyUI Manager
Search for "FL CosyVoice3" and install.
Manual
cd ComfyUI/custom_nodes
git clone https://github.com/filliptm/ComfyUI_FL-CosyVoice3.git
cd ComfyUI_FL-CosyVoice3
pip install -r requirements.txt
Quick Start
- Add FL CosyVoice3 Model Loader and select
Fun-CosyVoice3-0.5B - Connect to Zero-Shot Clone or Cross-Lingual node
- Provide reference audio (3-30 seconds recommended)
- Enter your text and generate
Models
| Model | Size | Notes | |-------|------|-------| | Fun-CosyVoice3-0.5B | ~2GB | Recommended | | CosyVoice2-0.5B | ~2GB | Alternative | | CosyVoice-300M | ~1.2GB | Lightweight |
Models download automatically on first use to ComfyUI/models/cosyvoice/.
Requirements
- Python 3.10+
- 8GB RAM minimum (16GB+ recommended)
- NVIDIA GPU with 8GB+ VRAM recommended (CPU and Mac MPS supported)
License
Apache 2.0