ComfyUI Extension: comfyui-maya1-tts
High-quality text-to-speech ComfyUI custom node powered by Maya1 model
Custom Nodes (0)
README
šļø Maya1 TTS ComfyUI Custom Node
äøęę攣 | English
A high-quality text-to-speech (TTS) ComfyUI custom node powered by the Maya1 model, featuring multiple voice styles and flexible parameter configuration.
⨠Features
- šµ High-Quality Speech Synthesis: Based on the advanced Maya1 voice generation model
- šØ Multiple Voice Styles: 5 built-in voice presets (male, female, different ages)
- š§ Flexible Parameters: Adjustable temperature, chunk length, and more
- š¾ Automatic Model Management: Models auto-download and cache to ComfyUI models directory
- ā” Smart Caching: Models load once for improved generation efficiency
- š International Support: English UI and logging
š¦ Installation
Method 1: Clone directly to ComfyUI custom nodes directory
cd ComfyUI/custom_nodes/
git clone https://github.com/ruanjianlun/comfyui_maya1_tts_alun.git
cd comfyui_alun_maya1
pip install -r requirements.txt
Method 2: Manual Installation
- Copy the entire project folder to
ComfyUI/custom_nodes/directory - Install dependencies:
pip install -r requirements.txt
Dependencies
Main dependencies include:
transformers- Hugging Face model librarytorch- PyTorch deep learning frameworksnac- SNAC audio codecsoundfile- Audio file processingnumpy- Numerical computing
š Usage
1. Add Node in ComfyUI
- Launch ComfyUI
- Right-click on canvas, select
Add Node->audio->maya1->Maya1 Text to Speech - The node will appear on the canvas
2. Configure Node Parameters
The node provides the following parameters:
| Parameter | Type | Description | Default | | ---------------------------- | ----------------- | ------------------------------------------------------ | -------------------------- | | text | String | Text to convert to speech | "Hello, this is a test..." | | voice_preset | Dropdown | Voice style preset | Male-Mature | | chunk_length | Integer | Text chunk length (characters) | 50 | | temperature | Float | Generation temperature (0.1-1.0, higher = more random) | 0.4 | | custom_description | String (Optional) | Custom voice description | Empty |
3. Voice Style Presets
5 built-in voice style presets:
- Male-Mature: Male voice around 30 years old, American accent, normal pitch, warm timbre
- Female-Gentle: Female voice around 20 years old, American accent, slightly high pitch, soft timbre
- Male-Energetic: Male voice around 20 years old, American accent, high energy, bright timbre
- Female-Professional: Female voice around 30 years old, British accent, professional tone, clear timbre
- Neutral-Broadcast: Neutral voice, standard accent, clear articulation, balanced timbre
4. Custom Voice Description
If presets don't meet your needs, use the custom_description parameter for custom voice characteristics:
Realistic male voice in the 40s age with british accent.
Deep pitch, authoritative tone, slow pacing.
Description template:
- Age: in the 20s/30s/40s age
- Accent: american/british/australian accent
- Pitch: high/normal/low/deep pitch
- Timbre: warm/bright/soft/clear timbre
- Pacing: fast/moderate/slow/conversational pacing
- Style: energetic/professional/gentle/authoritative tone
š Project Structure
comfyui_alun_maya1/
āāā __init__.py # Node registration entry file
āāā maya_tts_node.py # Main node implementation
āāā config.py # Configuration file (model paths, constants, etc.)
āāā requirements.txt # Python dependencies list
āāā workflow_example.json # ComfyUI workflow example
āāā README.md # Project documentation (English)
āāā README.zh-CN.md # Project documentation (Chinese)
š§ Configuration
Model Storage Location
Models will auto-download to:
ComfyUI/models/maya1_tts_alun/
Includes the following model files:
- Maya1 main model (~4GB)
- Maya1 tokenizer
- SNAC audio decoder (~200MB)
Output File Location
Generated audio files are saved to:
ComfyUI/output/maya_tts_XXXXXX.wav
File naming format: maya_tts_{timestamp}.wav
Audio Specifications
- Sample Rate: 24000 Hz
- Format: WAV (16-bit PCM)
- Channels: Mono
š Workflow Example
The project includes a basic workflow example workflow_example.json, demonstrating how to:
- Use the Maya1 TTS node to generate speech
- Save the generated audio file
- Preview and play the audio
Import Workflow
- Open ComfyUI
- Click the
Loadbutton - Select the
workflow_example.jsonfile - Click
Queue Promptto start generation
āļø Advanced Configuration
Modify Default Model Path
Edit the get_maya_model_path() function in config.py:
def get_maya_model_path():
# Custom model path
return "/your/custom/path/to/models"
Add New Voice Presets
Add to the VOICE_PRESETS dictionary in config.py:
VOICE_PRESETS = {
# ... existing presets
"Custom-Preset-Name": "Your custom voice description here",
}
Clear Model Cache
To free up VRAM, call:
from config import clear_model_cache
clear_model_cache()
š Common Issues
Q1: First run is very slow?
A: First run requires downloading models (~4GB), please be patient. Models cache locally, subsequent usage will be much faster.
Q2: Out of memory errors?
A:
- Reduce the
chunk_lengthparameter (e.g., change to 30) - Close other VRAM-consuming programs
- Use CPU mode (auto-detects, no configuration needed)
Q3: Generated speech quality is not ideal?
A:
- Adjust the
temperatureparameter (0.3-0.5 usually works well) - Try different voice presets
- Use custom descriptions for more precise voice characteristics
- Ensure input text has correct grammar and punctuation
Q4: Does it support Chinese?
A: Current version primarily supports English text-to-speech, Chinese support is limited. Recommended to use English text for best results.
Q5: How to speed up generation?
A:
- Ensure using GPU (CUDA)
- Reduce
chunk_length(but may affect speech continuity) - Lower
temperaturevalue - Process long text in segments
š Changelog
v1.0.0 (2025-01-09)
- ⨠Initial release
- šµ Support for Maya1 text-to-speech
- šØ 5 built-in voice presets
- š§ Flexible parameter configuration
- š Automatic model management
- š Complete English support
š License
This project is open source under the MIT License.
š Acknowledgments
- Maya Research - Maya1 model
- Hugging Face - Model hosting
- ComfyUI - Workflow platform
š® Contact
- Author: Alun
- Issues: Please submit on GitHub Issues
š Support the Project
If this project helps you, please:
- ā Star the project
- š Report bugs
- š” Suggest new features
- š¤ Contribute code
Enjoy high-quality speech synthesis! š