ComfyUI Extension: ComfyUI-Gemini_TTS
A powerful ComfyUI custom node that brings Google's Gemini TTS capabilities directly to your workflow. Generate high-quality speech with 30+ voices supporting both free and paid tiers.
Custom Nodes (0)
README
🎙️ ComfyUI-Gemini_TTS
A powerful ComfyUI custom node that brings Google's Gemini TTS capabilities directly to your workflow. Generate high-quality speech with 30+ voices supporting both free and paid tiers.
✨ Features
- 30+ Premium Voices: Male and female voices with unique characteristics
- Dual Tier Support: Free tier with generous limits + Paid tier for production use
- Smart Fallback: Automatic model switching when quotas are reached
- Voice Characteristics: Detailed voice info with personality descriptions
- Flexible Configuration: Environment variables, node parameters, or config file
- Robust Error Handling: Clear error messages and automatic retry logic
- Real-time Pricing: Cost estimates for paid tier usage
🚀 Quick Start
1. Installation
-
Clone or download this repository to your ComfyUI custom nodes folder:
cd ComfyUI/custom_nodes/ git clone https://github.com/ShmuelRonen/ComfyUI-Gemini_TTS.git
-
Install dependencies:
cd gemini-tts-node pip install google-generativeai requests torch torchaudio numpy
-
Restart ComfyUI - The node will appear as "🎙️ Gemini Text-to-Speech"
2. Get Your API Key
Free Tier (Recommended to Start)
- Go to Google AI Studio
- Sign in with your Google account
- Click "Get API Key" → "Create API Key"
- Select "Create API key in new project"
- Copy your API key (starts with
AIza...
)
Paid Tier (For Production)
See the Paid Tier Setup section below.
3. Configure the Node
Option A: Environment Variable (Recommended)
export GEMINI_API_KEY="your_api_key_here"
Option B: Direct Input
- Enter your API key directly in the node's
api_key
field - The node will save it automatically for future use
🎭 Available Voices
Female Voices (14 total)
- Aoede - Breezy and natural
- Kore - Firm and confident
- Leda - Youthful and energetic
- Zephyr - Bright and cheerful
- Autonoe - Bright and optimistic
- Callirhoe - Easy-going and relaxed
- Despina - Smooth and flowing
- Erinome - Clear and precise
- Gacrux - Mature and experienced
- Laomedeia - Upbeat and lively
- Pulcherrima - Forward and expressive
- Sulafat - Warm and welcoming
- Vindemiatrix - Gentle and kind
- Achernar - Soft and gentle
Male Voices (16 total)
- Puck - Upbeat and energetic (default)
- Charon - Informative and clear
- Fenrir - Excitable and dynamic
- Orus - Firm and decisive
- Achird - Friendly and approachable
- Algenib - Gravelly texture
- Algieba - Smooth and pleasant
- Alnilam - Firm and strong
- Enceladus - Breathy and soft
- Iapetus - Clear and articulate
- Rasalgethi - Informative and professional
- Sadachbia - Lively and animated
- Sadaltager - Knowledgeable and authoritative
- Schedar - Even and balanced
- Umbriel - Easy-going and calm
- Zubenelgenubi - Casual and conversational
⚙️ Node Parameters
Required Parameters
prompt
: Text to convert to speech (supports "Say:" prefix)tts_model
: Choose between:gemini-2.5-pro-preview-tts
(Higher quality, slower)gemini-2.5-flash-preview-tts
(Faster, good quality)
voice
: Select from 30+ available voicestemperature
: Control creativity (0.0-2.0, default: 1.0)
Optional Parameters
api_key
: Enter API key directly (auto-saved)auto_fallback_to_flash
: Auto-switch to Flash if Pro is rate-limitedretry_delay
: Wait time between retries (10-120 seconds)use_paid_tier
: Enable paid billing for higher quotasbilling_project_id
: Google Cloud project ID for billingaggressive_retry
: More retry attempts for better reliabilityshow_voice_info
: Display voice characteristics in output
💰 Paid Tier Setup
Why Upgrade to Paid Tier?
| Feature | Free Tier | Paid Tier | |---------|-----------|-----------| | Quota Limits | Low (good for testing) | High (production ready) | | Rate Limits | Restrictive | Generous | | Priority Access | Standard | Premium | | Cost | Free | ~$0.001-0.02 per request |
Step-by-Step Paid Setup
1. Create Google Cloud Project
- Go to Google Cloud Console
- Click "New Project" or select existing project
- Enter project name (e.g., "my-gemini-tts")
- Note your Project ID (not the name - this is important!)
2. Enable Billing
- In Google Cloud Console, go to Billing
- Click "Link a billing account" or "Enable billing"
- Add a payment method (credit card required)
- Verify billing is active on your project
3. Enable the Gemini API
- Go to APIs & Services > Library
- Search for "Generative Language API"
- Click "Enable" on the Generative Language API
- Wait for activation (usually instant)
4. Create API Key
- Go to APIs & Services > Credentials
- Click "Create Credentials" > "API Key"
- Copy your new API key
- Optional: Restrict the key to "Generative Language API" for security
5. Configure the Node
Set these parameters in the node:
use_paid_tier
:True
billing_project_id
: Your Project ID from step 1api_key
: Your API key from step 4
💵 Pricing Information
Gemini 2.5 Pro TTS:
- Input: $1.00 per 1M tokens
- Output: $20.00 per 1M tokens
- ~$0.01-0.02 per typical request
Gemini 2.5 Flash TTS:
- Input: $0.50 per 1M tokens
- Output: $10.00 per 1M tokens
- ~$0.005-0.01 per typical request
Typical 20-word sentence costs less than $0.02
🔧 Troubleshooting
Common Issues
"API key not valid" Error
- Solution: Verify your API key starts with
AIza
and is ~39 characters - Check: API key hasn't expired or been deleted
- Verify: You're using the correct key from Google AI Studio or Cloud Console
"Rate limit exceeded" Error
- Free Tier: Wait 60 seconds or try Flash model
- Solution: Enable paid tier for higher quotas
- Temporary: Use
auto_fallback_to_flash = True
"Billing project not found" Error
- Check: Use Project ID, not project name
- Verify: Project exists and billing is enabled
- Confirm: API key belongs to the same project
"Permission denied" Error
- Verify: Generative Language API is enabled
- Check: API key has proper permissions
- Ensure: Billing is active if using paid tier
Configuration Files
The node creates a config.json
file to save your settings:
{
"GEMINI_API_KEY": "your_key_here",
"use_paid_tier": true,
"billing_project_id": "your-project-id"
}
Debug Information
Enable debugging by checking console output:
- Green ✅: Successful operations
- Yellow ⚠️: Warnings and fallbacks
- Red ❌: Errors requiring attention
📝 Usage Examples
Basic Text-to-Speech
Prompt: "Hello, welcome to our presentation today."
Model: gemini-2.5-flash-preview-tts
Voice: [F] Zephyr
Temperature: 1.0
Expressive Reading
Prompt: "Say: Once upon a time, in a land far, far away..."
Model: gemini-2.5-pro-preview-tts
Voice: [M] Charon
Temperature: 1.5
Show Voice Info: True
Production Setup
Use Paid Tier: True
Billing Project ID: my-production-project-123
Aggressive Retry: True
Model: gemini-2.5-pro-preview-tts
🛡️ Security Best Practices
- Protect Your API Key: Never commit API keys to version control
- Use Environment Variables: Set
GEMINI_API_KEY
in your environment - Restrict API Keys: Limit to specific APIs in Google Cloud Console
- Monitor Usage: Check Google Cloud billing dashboard regularly
- Project Isolation: Use separate projects for development vs production
🔄 Updates and Compatibility
- ComfyUI: Compatible with latest versions
- Python: Requires Python 3.8+
- Dependencies: Auto-updated through pip
- Voice Library: Automatically synced with Google's latest voices
📞 Support
Common Solutions
- Restart ComfyUI after installation or configuration changes
- Check Console Output for detailed error messages
- Verify API Key Format (should start with
AIza
) - Confirm Project Settings in Google Cloud Console
Getting Help
- Check the troubleshooting section above
- Review console output for specific error messages
- Verify your Google Cloud project configuration
- Ensure billing is properly enabled for paid tier
📜 License
This project is provided as-is for educational and commercial use. Google Gemini API usage is subject to Google's terms of service and pricing.
🎉 Ready to generate amazing speech with Gemini TTS!
Last updated: May 2025