ComfyUI Extension: ComfyUI-FireRedTTS
A ComfyUI integration for FireRedTTS‑2, a real-time multi-speaker TTS system enabling high-quality, emotionally expressive dialogue and monologue synthesis. Leveraging a streaming architecture and context-aware prosody modeling, it supports natural speaker turns and stable long-form generation, ideal for interactive chat and podcast applications.
Custom Nodes (0)
README
ComfyUI-FireRedTTS
A ComfyUI integration for FireRedTTS‑2, a real-time multi-speaker TTS system enabling high-quality, emotionally expressive dialogue and monologue synthesis. Leveraging a streaming architecture and context-aware prosody modeling, it supports natural speaker turns and stable long-form generation, ideal for interactive chat and podcast applications.
Features
- Dialogue Generation: Multi-speaker conversation audio generation
- Monologue Generation: Single-speaker narrative audio generation
- Voice Cloning: Zero-shot voice cloning functionality
- Multi-language Support: Chinese, English, Japanese, Korean, French, German, Russian
- Automatic Model Download: Models download automatically on first use
- Device Adaptive: Automatically selects optimal device (CUDA/MPS/CPU)
Installation
Method 1: ComfyUI Manager (Recommended)
- open [ComfyUI Manager]
- Search for "ComfyUI-FireRedTTS" in ComfyUI Manager
- Click Install
Method 2: Manual Installation
- Clone this repository to your ComfyUI custom nodes directory:
cd ComfyUI/custom_nodes
git clone https://github.com/1038lab/ComfyUI-FireRedTTS.git
- Install dependencies:
cd ComfyUI-FireRedTTS
pip install -r requirements.txt
- Restart ComfyUI
Model Download
On first use, the system will automatically download the FireRedTTS2 model from Hugging Face:
- Model source: FireRedTeam/FireRedTTS2
- Storage location:
ComfyUI\models\TTS\FireRedTTS2
- Download size: ~2GB
A progress bar will show during download. Once complete, the model is cached for future use.
Nodes
FireRedTTS2 Dialogue Node
Generates Multi-speaker dialogue audio.
Inputs:
text_list
(STRING): Dialogue text with speaker tags ([S1], [S2])temperature
(FLOAT): Controls generation randomness (0.1-2.0, default: 0.9)topk
(INT): Controls sampling range (1-100, default: 30)S1
(AUDIO, optional): Reference audio for Speaker 1S1_text
(STRING, optional): Reference text for Speaker 1S2
(AUDIO, optional): Reference audio for Speaker 2S2_text
(STRING, optional): Reference text for Speaker 2
Outputs:
audio
(AUDIO): Generated dialogue audiosample_rate
(INT): Audio sample rate (24000Hz)
FireRedTTS2 Monologue Node
Generates single-speaker monologue audio.
Inputs:
text
(STRING): Input text contenttemperature
(FLOAT): Temperature parameter (0.1-2.0, default: 0.75)topk
(INT): TopK parameter (1-100, default: 20)prompt_wav
(STRING, optional): Reference audio file pathprompt_text
(STRING, optional): Reference text content
Outputs:
audio
(AUDIO): Generated monologue audiosample_rate
(INT): Audio sample rate (24000Hz)
Usage
Speaker Tag Format
Use square brackets to mark different speakers in dialogue text:
[S1]Hello, what a nice day![S2]Yes, perfect for a walk.[S1]Shall we go to the park?[S2]Great idea!
Supported speaker tags:
[S1]
- Speaker 1[S2]
- Speaker 2
Voice Cloning Setup
For voice cloning, provide both audio and text for each speaker:
Speaker 1 (S1):
- Connect reference audio to
S1
input - Enter reference text in
S1_text
field
Speaker 2 (S2):
- Connect reference audio to
S2
input - Enter reference text in
S2_text
field
Examples
Basic Dialogue Generation
- Add "FireRedTTS2 Dialogue" node
- Input in
text_list
:[S1]Welcome to our podcast![S2]Today we'll discuss AI development.[S1]That's a fascinating topic indeed.
- Adjust
temperature
andtopk
parameters - Connect audio output to preview or save node
Voice Cloning Dialogue
- Prepare reference audio files for each speaker
- Connect Speaker 1 reference audio to
S1
input - Enter Speaker 1 reference text in
S1_text
:This is a voice sample for speaker one
- Connect Speaker 2 reference audio to
S2
input - Enter Speaker 2 reference text in
S2_text
:This is a voice sample for speaker two
Monologue Generation
- Add "FireRedTTS2 Monologue" node
- Input long text content in
text
field - Optionally provide
prompt_wav
andprompt_text
for voice cloning - Adjust parameters and generate audio
Parameter Guide
Temperature
- Low (0.1-0.5): More stable, consistent speech
- Medium (0.6-1.0): Balanced stability and naturalness
- High (1.1-2.0): More variation and expressiveness, may be unstable
TopK
- Low (1-20): Conservative sampling, more stable speech
- Medium (21-50): Balanced choice
- High (51-100): More diverse sampling, increased variation
Troubleshooting
Common Issues
Q: Model download fails A: Check network connection and Hugging Face access. Try using proxy or mirror sites.
Q: CUDA out of memory A:
- Reduce input text length
- Lower batch size
- Use CPU mode by setting
device="cpu"
in code
Q: Poor audio quality A:
- Check input text format is correct
- Adjust temperature parameter (recommended 0.7-1.0)
- Ensure reference audio quality is good (if using voice cloning)
Q: Speaker tags not working A:
- Ensure correct tag format:
[S1]
,[S2]
, etc. - Check for extra spaces around tags
- Confirm text contains corresponding speaker tags
Q: Node loading fails A:
- Check dependencies are properly installed
- Verify ComfyUI version compatibility
- Check console for error messages
Performance Optimization
Memory Optimization:
- Long texts are automatically split for processing
- Model instances are cached and reused
- Recommended single text length: under 500 characters
Speed Optimization:
- First use requires model download, subsequent uses are faster
- GPU acceleration significantly improves generation speed
- Batch processing multiple short texts is more efficient than single long text
System Requirements
Minimum:
- Python 3.8+
- 4GB RAM
- 2GB storage space (for models)
Recommended:
- Python 3.9+
- 8GB+ RAM
- NVIDIA GPU (4GB+ VRAM)
- SSD storage
Support
If you encounter issues, please check:
- Dependencies are fully installed
- Models downloaded correctly
- Input format meets requirements
- System resources are sufficient
For more technical details, refer to the project source code and FireRedTTS2 official documentation.