ComfyUI Extension: Maya1 TTS

Authored by Saganaki22

Created

Updated

49 stars

ComfyUI node for Maya1 TTS - Expressive voice generation with 20+ emotions, voice design, and SNAC neural codec

Custom Nodes (0)

    README

    ComfyUI-Maya1_TTS

    Expressive Voice Generation with Emotions for ComfyUI

    A ComfyUI node pack for Maya1, a 3B-parameter speech model built for expressive voice generation with rich human emotion and precise voice design.

    License Python ComfyUI

    https://github.com/user-attachments/assets/1be0c2a0-22fb-4890-9147-d20abeb2e067


    โœจ Features

    Core Features

    • ๐ŸŽญ Voice Design through natural language descriptions
    • ๐Ÿ˜Š 16 Emotion Tags: laugh, cry, whisper, angry, sigh, gasp, scream, and more
    • โšก Real-time Generation with SNAC neural codec (24kHz audio)
    • ๐Ÿ”ง Multiple Attention Mechanisms: SDPA, eager, Flash Attention 2, Sage Attention (1/2)
    • ๐Ÿ’พ Quantization Support: 4-bit and 8-bit for memory-constrained GPUs
    • ๐Ÿ›‘ Native ComfyUI Cancel: Stop generation anytime
    • ๐Ÿ“Š Progress Tracking: Real-time token generation speed (it/s)
    • ๐Ÿ”„ Model Caching: Fast subsequent generations
    • ๐ŸŽฏ Smart VRAM Management: Auto-clears on dtype changes

    Custom Canvas UI

    • ๐ŸŽจ Beautiful Dark Theme with purple accents and smooth animations
    • ๐Ÿ‘ค 5 Character Presets: Quick-load voice templates (Male US, Female UK, Announcer, Robot, Demon)
    • ๐ŸŽญ 16 Visual Emotion Buttons: One-click emotion tag insertion at cursor position
    • โ›ถ Professional HTML Modal Editor: Fullscreen text editor with native textarea for longform content
    • ๐Ÿ”ค Font Size Controls: Adjustable 12-20px font size with visual slider
    • โŒจ๏ธ Advanced Keyboard Shortcuts: Ctrl+A, Ctrl+C, Ctrl+V, Ctrl+X, Ctrl+Enter to save, ESC to cancel
    • ๐Ÿ”” Toast Notifications: Visual feedback for save success and validation errors
    • ๐Ÿ“ Inline Text Editing: Click-to-edit with cursor positioning and drag-to-select
    • ๐Ÿ–ฑ๏ธ Scroll Support: Custom themed scrollbars with mouse wheel scrolling
    • ๐Ÿ“ฑ Responsive Design: Modal adapts to all screen sizes
    • ๐Ÿ’ก Contextual Tooltips: Helpful hints on every control
    • ๐ŸŽฌ Collapsible Sections: Clean, organized interface
    • ๐Ÿ”„ Smart Audio Processing: Auto-chunking for long text with crossfade blending for seamless output

    ๐Ÿ“ฆ Installation

    <details> <summary><b>Quick Install (Click to expand)</b></summary>

    1. Clone the Repository

    cd ComfyUI/custom_nodes/
    git clone https://github.com/Saganaki22/ComfyUI-Maya1_TTS.git
    cd ComfyUI-Maya1_TTS
    

    2. Install Dependencies

    Core dependencies (required):

    pip install torch>=2.0.0 transformers>=4.50.0 numpy>=1.21.0 snac>=1.0.0
    

    Or install from requirements.txt:

    pip install -r requirements.txt
    
    </details> <details> <summary><b>Optional: Enhanced Performance (Click to expand)</b></summary>

    Quantization (Memory Savings)

    For 4-bit/8-bit quantization support:

    pip install bitsandbytes>=0.41.0
    

    Memory savings:

    • 4-bit: ~6GB โ†’ (slight quality loss)
    • 8-bit: ~6GB โ†’ (minimal quality loss)

    Accelerated Attention

    Flash Attention 2 (CUDA only):

    pip install flash-attn>=2.0.0
    

    Sage Attention (memory efficient for batch):

    pip install sageattention>=1.0.0
    

    Install All Optional Dependencies

    pip install bitsandbytes flash-attn sageattention
    
    </details> <details> <summary><b>Download Maya1 Model (Click to expand)</b></summary>

    Model Location

    Models go in: ComfyUI/models/maya1-TTS/

    Expected Folder Structure

    After downloading, your model folder should look like this:

    ComfyUI/
    โ””โ”€โ”€ models/
        โ””โ”€โ”€ maya1-TTS/
            โ””โ”€โ”€ maya1/                                # Model name (can be anything)
                โ”œโ”€โ”€ chat_template.jinja               # Chat template
                โ”œโ”€โ”€ config.json                       # Model configuration
                โ”œโ”€โ”€ generation_config.json            # Generation settings
                โ”œโ”€โ”€ model-00001-of-00002.safetensors  # Model weights (shard 1)
                โ”œโ”€โ”€ model-00002-of-00002.safetensors  # Model weights (shard 2)
                โ”œโ”€โ”€ model.safetensors.index.json      # Weight index
                โ”œโ”€โ”€ special_tokens_map.json           # Special tokens
                โ””โ”€โ”€ tokenizer/                        # Tokenizer subfolder
                    โ”œโ”€โ”€ chat_template.jinja           # Chat template (duplicate)
                    โ”œโ”€โ”€ special_tokens_map.json       # Special tokens (duplicate)
                    โ”œโ”€โ”€ tokenizer.json                # Tokenizer vocabulary (22.9 MB)
                    โ””โ”€โ”€ tokenizer_config.json         # Tokenizer config
    

    Critical files required:

    • config.json - Model architecture configuration
    • generation_config.json - Default generation parameters
    • model-00001-of-00002.safetensors & model-00002-of-00002.safetensors - Model weights (2 shards)
    • model.safetensors.index.json - Weight index mapping
    • chat_template.jinja & special_tokens_map.json - In root folder
    • tokenizer/ folder with all 4 tokenizer files

    Note: You can have multiple models by creating separate folders like maya1, maya1-finetuned, etc.

    Option 1: Hugging Face CLI (Recommended)

    # Install HF CLI
    pip install huggingface-hub
    
    # Create directory
    cd ComfyUI
    mkdir -p models/maya1-TTS
    
    # Download model
    hf download maya-research/maya1 --local-dir models/maya1-TTS/maya1
    

    Option 2: Python Script

    from huggingface_hub import snapshot_download
    
    snapshot_download(
        repo_id="maya-research/maya1",
        local_dir="ComfyUI/models/maya1-TTS/maya1",
        local_dir_use_symlinks=False
    )
    

    Option 3: Manual Download

    1. Go to Maya1 on HuggingFace
    2. Download all files to ComfyUI/models/maya1-TTS/maya1/
    </details> <details> <summary><b>Restart ComfyUI</b></summary>

    Restart ComfyUI to load the new nodes. The node will appear under:

    Add Node โ†’ audio โ†’ Maya1 TTS (AIO) / Maya1 TTS (AIO) Barebones

    </details>

    ๐ŸŽฎ Usage

    Two Node Options

    Maya1 TTS (AIO) - Full custom UI with visual controls (recommended)

    • Beautiful dark theme with character presets, emotion buttons, and modal editor
    • Best user experience with visual feedback and tooltips

    Maya1 TTS (AIO) Barebones - Standard ComfyUI widgets only

    • For users experiencing JavaScript rendering issues (black box)
    • Same functionality, simpler interface
    • All inputs stacked vertically with standard dropdowns and text boxes

    Node: Maya1 TTS (AIO)

    All-in-one node for loading models and generating speech with a beautiful custom canvas UI.

    | Maya1 TTS (AIO) | Maya1 TTS (AIO) Barebones | |:---:|:---:| | <img width="615" alt="Screenshot 2025-11-07 084153" src="https://github.com/user-attachments/assets/19105cc2-030a-40e3-b4d9-e18bd6d50b65" /> | <img width="648" alt="image" src="https://github.com/user-attachments/assets/7aff4fbb-c434-4d16-b4f9-22017167d455" /> |

    โœจ Custom Canvas Interface

    The node features a completely custom-built interface with:

    Character Presets (Top Row)

    • Click any preset to instantly load a pre-configured voice description
    • 5 presets: โ™‚๏ธ Male US, โ™€๏ธ Female UK, ๐ŸŽ™๏ธ Announcer, ๐Ÿค– Robot, ๐Ÿ˜ˆ Demon

    Text Fields

    • Voice Description: Describe your desired voice characteristics
    • Text: Your script with optional emotion tags
    • Click inside to edit with full keyboard support
    • Press Enter for new line, Ctrl+Enter to save, Escape to cancel

    Emotion Tags (Collapsible Grid)

    • 16 emotion buttons in 4ร—4 grid
    • Click any emotion to insert tag at cursor position
    • Tags insert where you're typing, not just at the end
    • Click header to collapse/expand section

    โ›ถ Professional HTML Modal (Bottom right of Text field)

    • Click the expand button (โ›ถ) for fullscreen text editing
    • Native HTML textarea with proper newline and whitespace support
    • Font Size Slider: Adjust text size from 12px to 20px with visual A/A controls
    • All 16 emotion buttons available inside modal for quick tag insertion
    • Custom Themed Scrollbar: Purple accents matching the node design
    • Toast Notifications: Green checkmark for "Text Saved", red X for validation errors
    • Empty Text Validation: Prevents saving blank text with helpful error message
    • Keyboard Shortcuts:
      • Ctrl+Enter: Save and close
      • ESC: Cancel without saving
      • Full text selection and clipboard support (Ctrl+A, C, V, X)
    • Responsive Design: Modal adapts to small and large screens, buttons always visible
    • Visual Hints: Subtle grey text under buttons showing keyboard shortcuts

    Keyboard Shortcuts (Inline Editing & Modal)

    • Enter: New line (in multiline text fields)
    • Ctrl+Enter: Save and apply changes
    • Escape: Cancel editing without saving
    • Ctrl+A: Select all text
    • Ctrl+C/V/X: Copy, paste, cut selected text
    • Click outside field: Auto-save (inline editing only)
    <details> <summary><b>Model Settings</b></summary>

    model_name (dropdown)

    • Select from models in ComfyUI/models/maya1-TTS/
    • Model auto-discovered on startup

    dtype (dropdown)

    • 4bit: NF4 quantization (~6GB VRAM, requires bitsandbytes, SLOWER)
    • 8bit: INT8 quantization (~7GB VRAM, requires bitsandbytes, SLOWER)
    • float16: 16-bit half precision (~8-9GB VRAM, FAST, good quality)
    • bfloat16: 16-bit brain float (~8-9GB VRAM, FAST, recommended)
    • float32: 32-bit full precision (~16GB VRAM, highest quality, slower)

    โš ๏ธ IMPORTANT: Quantization (4-bit/8-bit) is SLOWER than float16/bfloat16!

    • Only use quantization if you have limited VRAM (<10GB)
    • If you have 10GB+ VRAM, use float16 or bfloat16 for best speed

    attention_mechanism (dropdown)

    • sdpa: PyTorch SDPA (default, fastest for single TTS)
    • flash_attention_2: Flash Attention 2 (batch inference)
    • sage_attention: Sage Attention (memory efficient)

    device (dropdown)

    • cuda: Use GPU (recommended)
    • cpu: Use CPU (slower)
    </details> <details> <summary><b>Voice & Text Settings</b></summary>

    voice_description

    Describe the voice using natural language. Click inside to edit or use character presets.

    Example:

    Realistic male voice in the 30s with American accent. Normal pitch, warm timbre, conversational pacing.
    

    Voice Components:

    • Age: in their 20s, 30s, 40s, 50s
    • Gender: Male voice, Female voice
    • Accent: American, British, Australian, Indian, Middle Eastern
    • Pitch: high pitch, normal pitch, low pitch
    • Timbre: warm, gravelly, smooth, raspy
    • Pacing: fast pacing, conversational, slow pacing
    • Tone: happy, angry, curious, energetic, calm

    ๐Ÿ’ก Tip: Use character presets for quick voice templates!

    text

    Text to synthesize with optional emotion tags. Click emotion buttons to insert tags at cursor.

    Example:

    Hello! This is Maya1 <laugh> the best open source voice AI!
    

    ๐Ÿ’ก Tip: Click โ›ถ expand button for longform text editing in fullscreen modal!

    </details> <details> <summary><b>Generation Settings</b></summary>

    keep_model_in_vram (boolean)

    • True: Keep model loaded for faster repeated generations
    • False: Clear VRAM after generation (saves memory)
    • Auto-clears when dtype changes

    chunk_longform (boolean) โš ๏ธ EXPERIMENTAL

    • True: Auto-split long text (>80 words) at sentences, combines audio
    • False: Generate entire text at once (may fail if too long)
    • Note: This feature is experimental and may have quality/timing issues

    temperature (0.1-2.0, default: 0.4)

    • Lower = more consistent
    • Higher = more varied/creative

    top_p (0.1-1.0, default: 0.9)

    • Nucleus sampling parameter
    • 0.9 recommended for natural speech

    max_tokens (100-8000, default: 2000)

    • Maximum audio tokens to generate
    • Higher = longer audio

    repetition_penalty (1.0-2.0, default: 1.1)

    • Reduces repetitive speech
    • 1.1 is good default

    seed (integer, default: 0)

    • Use same seed for reproducible results
    • Use ComfyUI's control_after_generate for random/increment
    </details> <details> <summary><b>Outputs</b></summary>

    audio (ComfyUI AUDIO type)

    • 24kHz mono audio
    • Compatible with all ComfyUI audio nodes
    • Connect to PreviewAudio, SaveAudio, etc.
    </details>

    Node: Maya1 TTS (AIO) Barebones

    Standard ComfyUI widgets version for users experiencing JavaScript rendering issues.

    When to use Barebones:

    • Custom UI shows as a black box
    • Browser console shows JavaScript errors
    • You prefer simple, standard ComfyUI widgets
    • Working with older ComfyUI versions

    Inputs (in order):

    1. voice_description (multiline text)

      • Describe voice characteristics in natural language
      • Same as main node, just standard text box
    2. text (multiline text)

      • Your script with manual emotion tags like <laugh> or <cry>
      • Type emotion tags manually (no visual buttons in barebones version)
    3. model_name (dropdown)

      • Select Maya1 model from ComfyUI/models/maya1-TTS/
    4. dtype (dropdown)

      • 4bit (BNB), 8bit (BNB), float16, bfloat16, float32
    5. attention_mechanism (dropdown)

      • sdpa (default), flash_attention_2, sage_attention
    6. device (dropdown)

      • cuda (GPU) or cpu
    7. keep_model_in_vram (boolean toggle)

      • Keep model loaded for faster subsequent generations
    8. chunk_longform (boolean toggle)

      • Split long text with crossfading for unlimited length
    9. max_tokens (integer)

      • Max SNAC tokens per chunk (default: 4000)
    10. temperature (float)

      • Generation randomness (default: 0.4)
    11. top_p (float)

      • Nucleus sampling (default: 0.9)
    12. repetition_penalty (float)

      • Reduce repetition (default: 1.1)
    13. seed (integer)

      • 0 = random, or set specific seed for reproducibility
      • Use control_after_generate widget for seed management

    All other features (model loading, VRAM management, chunking, progress tracking) work identically to the main node.


    ๐ŸŽญ Emotion Tags

    Add emotions anywhere in your text using <tag> syntax, or click the visual emotion buttons in the UI!

    Examples:

    Hello! This is amazing <laugh> I can't believe it!
    
    After all we went through <cry> I can't believe he was the traitor.
    
    Wow! <gasp> This place looks incredible!
    
    <details> <summary><b>All 16 Available Emotions (Click to expand)</b></summary>

    Laughter & Joy:

    • <laugh> - Normal laugh
    • <laugh_harder> - Intense laughing
    • <giggle> - Light giggling
    • <chuckle> - Soft chuckle

    Sadness & Sighs:

    • <cry> - Crying
    • <sigh> - Sighing

    Surprise & Breath:

    • <gasp> - Surprised gasp
    • <excited> - Excited tone

    Intensity & Emotion:

    • <whisper> - Whispering
    • <angry> - Angry tone
    • <scream> - Screaming
    • <sarcastic> - Sarcastic delivery

    Natural Sounds:

    • <snort> - Snorting
    • <exhale> - Exhaling
    • <gulp> - Gulping
    • <sing> - Singing
    </details>

    ๐Ÿ’ก Tip: Click emotion buttons in the node UI to insert tags at cursor position!


    ๐ŸŽฌ Example Character Speeches

    <details> <summary><b>Generative AI & ComfyUI Examples (Click to expand)</b></summary>

    Example 1: Excited AI Researcher

    Voice Description:

    Female voice in her 30s with American accent. High pitch, energetic tone at high intensity, fast pacing.
    

    Text:

    Oh my god! <laugh> Have you seen the new Stable Diffusion model in ComfyUI? The quality is absolutely incredible! <gasp> I just generated a photorealistic portrait in like 20 seconds. This is game-changing for our workflow!
    

    Example 2: Skeptical Developer

    Voice Description:

    Male voice in his 40s with British accent. Low pitch, calm tone, conversational pacing.
    

    Text:

    I've been testing this new node pack in ComfyUI <sigh> and honestly, I'm impressed. At first I was skeptical about the whole generative AI hype, but <gasp> the control you get with custom nodes is remarkable. This changes everything.
    

    Example 3: Enthusiastic Tutorial Creator

    Voice Description:

    Female voice in her 20s with Australian accent. Normal pitch, warm timbre, energetic tone at medium intensity.
    

    Text:

    Hey everyone! <laugh> Welcome back to my ComfyUI tutorial series! Today we're diving into the most powerful image generation workflow I've ever seen. <gasp> You're not gonna believe how easy this is! Let's get started!
    

    Example 4: Frustrated Beginner

    Voice Description:

    Male voice in his 30s with American accent. Normal pitch, stressed tone at medium intensity, fast pacing.
    

    Text:

    Why won't this workflow run? <angry> I've connected all the nodes exactly like the tutorial showed! <sigh> Wait... Oh no. <laugh> I forgot to load the checkpoint model. Classic beginner mistake! Okay, let's try this again.
    

    Example 5: Amazed AI Artist

    Voice Description:

    Female voice in her 40s with Indian accent. Normal pitch, curious tone, slow pacing, dramatic delivery.
    

    Text:

    When I first discovered ComfyUI <whisper> I thought it was just another image generator. But then <gasp> I realized you can chain workflows together, use custom models, and <laugh> even generate animations! This is the future of digital art!
    

    Example 6: Confident AI Entrepreneur

    Voice Description:

    Male voice in his 50s with Middle Eastern accent. Low pitch, gravelly timbre, slow pacing, confident tone at high intensity.
    

    Text:

    The generative AI revolution is here. <dramatic pause> ComfyUI gives us the tools to build production-ready workflows. <chuckle> While others are still playing with web UIs, we're automating entire creative pipelines. This is how you stay ahead of the curve.
    
    </details>

    โš™๏ธ Advanced Configuration

    <details> <summary><b>Attention Mechanisms Comparison</b></summary>

    | Mechanism | Speed | Memory | Best For | Requirements | |-----------|-------|--------|----------|--------------| | SDPA | โšกโšกโšก | Good | Single TTS generation | PyTorch โ‰ฅ2.0 | | Flash Attention 2 | โšกโšก | Good | Batch processing | flash-attn, CUDA | | Sage Attention | โšกโšก | Excellent | Long sequences | sageattention |

    Why is SDPA fastest for TTS?

    • Optimized for single-sequence autoregressive generation
    • Lower kernel launch overhead (~20ฮผs vs 50-60ฮผs)
    • Flash/Sage Attention shine with batch size โ‰ฅ8

    Recommendation: Use SDPA (default) for single audio generation.

    </details> <details> <summary><b>Quantization Details</b></summary>

    โš ๏ธ CRITICAL: Quantization is SLOWER than fp16/bf16!

    Memory Usage (Maya1 3B Model)

    | Dtype | VRAM Usage | Speed | Quality | |-------|------------|-------|---------| | 4-bit NF4 | ~6GB | Slow โšก | Good (slight loss) | | 8-bit INT8 | ~7GB | Slow โšก | Excellent (minimal loss) | | float16 | ~8-9GB | Fast โšกโšกโšก | Excellent | | bfloat16 | ~8-9GB | Fast โšกโšกโšก | Excellent | | float32 | ~16GB | Medium โšกโšก | Perfect |

    4-bit NF4 Quantization

    Features:

    • Uses NormalFloat4 (NF4) for best 4-bit quality
    • Double quantization (nested) for better accuracy
    • Memory savings: ~6GB (vs ~8-9GB for fp16)

    When to use:

    • You have limited VRAM (8GB or less GPU)
    • Speed is not critical (inference is slower due to dequantization)
    • Need to fit model in smaller VRAM

    When NOT to use:

    • You have 10GB+ VRAM โ†’ Use float16/bfloat16 instead for better speed!

    8-bit INT8 Quantization

    Features:

    • Standard 8-bit integer quantization
    • Memory savings: ~7GB (vs ~8-9GB for fp16)
    • Minimal quality impact

    When to use:

    • You have moderate VRAM constraints (8-10GB GPU)
    • Want good quality with some memory savings
    • Speed is not critical

    When NOT to use:

    • You have 10GB+ VRAM โ†’ Use float16/bfloat16 instead for better speed!

    Why is Quantization Slower?

    Quantized models require dequantization on every forward pass:

    1. Model weights stored in 4-bit/8-bit
    2. Weights dequantized to fp16 for computation
    3. Computation happens in fp16
    4. Extra overhead = slower inference

    Recommendation: Only use quantization if you truly need the memory savings!

    Automatic Dtype Switching

    The node automatically clears VRAM when you switch dtypes:

    ๐Ÿ”„ Dtype changed from bfloat16 to 4bit
       Clearing cache to reload model...
    

    This prevents dtype mismatch errors and ensures correct quantization.

    </details> <details> <summary><b>Console Progress Output</b></summary>

    Real-time generation statistics in the console:

    ๐ŸŽฒ Seed: 1337
    ๐ŸŽต Generating speech (max 2000 tokens)...
       Tokens: 500/2000 | Speed: 12.45 it/s | Elapsed: 40.2s
    โœ… Generated 1500 tokens in 120.34s (12.47 it/s)
    

    it/s = iterations per second (tokens/second)

    </details>

    ๐Ÿ› Troubleshooting

    <details> <summary><b>Node Shows as Black Box (JavaScript Issues)</b></summary>

    Issue: Maya1 TTS (AIO) node appears completely black with no widgets visible.

    Quick Fix: Use Maya1 TTS (AIO) Barebones instead!

    • Same functionality, standard ComfyUI widgets only
    • No custom JavaScript required
    • Find it under: Add Node โ†’ audio โ†’ Maya1 TTS (AIO) Barebones

    Debugging Steps:

    1. Open browser DevTools (F12) โ†’ Console tab
    2. Look for JavaScript errors mentioning "maya1" or "Unexpected token"
    3. Try hard refresh: Ctrl+Shift+R (Windows/Linux) or Cmd+Shift+R (Mac)
    4. Clear browser cache completely
    5. Test in incognito/private window
    6. Check if maya1_tts.js loads in Network tab (should be 200 status)
    7. Disable browser extensions (ad blockers, script blockers)
    8. Update ComfyUI to latest version

    Note: The barebones version is specifically designed for this issue!

    </details> <details> <summary><b>Model Not Found</b></summary>

    Error: No valid Maya1 models found

    Solutions:

    1. Check model location: ComfyUI/models/maya1-TTS/
    2. Download model (see Installation section)
    3. Restart ComfyUI
    4. Check console for model discovery messages
    </details> <details> <summary><b>Out of Memory (OOM)</b></summary>

    Error: CUDA out of memory

    Memory requirements:

    • 4-bit: ~6GB VRAM (slower)
    • 8-bit: ~7GB VRAM (slower)
    • float16/bfloat16: ~8-9GB VRAM (fast, recommended)
    • float32: ~16GB VRAM

    Solutions (try in order):

    1. Use 4-bit dtype if you have โ‰ค8GB VRAM (~6GB usage)
    2. Use 8-bit dtype if you have ~8-10GB VRAM (~7GB usage)
    3. Use float16 if you have 10GB+ VRAM (faster than quantization!)
    4. Enable keep_model_in_vram=False to free VRAM after generation
    5. Reduce max_tokens to 1000-1500
    6. Close other VRAM-heavy applications
    7. Use CPU (much slower but works)

    Note: If you have 10GB+ VRAM, use float16/bfloat16 for best speed!

    </details> <details> <parameter name="summary"><b>Quantization Errors</b></summary>

    Error: bitsandbytes not found

    Solution:

    pip install bitsandbytes>=0.41.0
    

    Error: Quantization requires CUDA

    Solution:

    • 4-bit/8-bit only work on CUDA
    • Switch to float16/bfloat16 for CPU
    </details> <details> <summary><b>No Audio Generated</b></summary>

    Error: No SNAC audio tokens generated!

    Solutions:

    1. Increase max_tokens to 2000-4000
    2. Adjust temperature to 0.3-0.5
    3. Simplify voice description
    4. Check text isn't too long
    5. Try different seed value
    </details> <details> <summary><b>Flash Attention Installation Failed</b></summary>

    Error: flash-attn won't install

    Solution:

    • Flash Attention requires CUDA and specific setup
    • Just use SDPA instead (works great, actually faster for TTS!)
    • SDPA is the recommended default
    </details> <details> <summary><b>Info Button Not Visible</b></summary>

    Issue: Can't see the "?" or "i" icon, only hover tooltip

    Answer: This is normal and working correctly!

    • ComfyUI's DESCRIPTION creates a hover tooltip
    • Some ComfyUI versions show no visible icon
    • Just hover over the node title area to see help
    • Contains all emotion tags and usage examples
    </details>

    ๐Ÿ“Š Performance Tips

    1. Use float16/bfloat16 if you have 10GB+ VRAM (fastest!)
    2. Use quantization (4-bit/8-bit) ONLY if limited VRAM (<10GB) - slower but fits in memory
    3. Keep SDPA as attention mechanism (fastest for single TTS)
    4. Enable model caching (keep_model_in_vram=True) for multiple generations
    5. Optimize max_tokens: Start with 1500-2000
    6. Batch similar requests with same voice description for efficiency

    โš ๏ธ Speed ranking: float16/bfloat16 (fastest) > float32 > 8-bit > 4-bit (slowest)


    ๐Ÿ—๏ธ Technical Details

    <details> <summary><b>Architecture</b></summary>
    • Model: 3B-parameter Llama-based transformer
    • Audio Codec: SNAC (Speech Neural Audio Codec)
    • Sample Rate: 24kHz mono
    • Frame Structure: 7 tokens per frame (3 hierarchical levels)
    • Token Ranges:
      • SNAC tokens: 128266-156937
      • Text EOS: 128009
      • SNAC EOS: 128258
    • Compression: ~0.98 kbps streaming
    </details> <details> <summary><b>File Structure</b></summary>
    ComfyUI-Maya1_TTS/
    โ”œโ”€โ”€ __init__.py                 # Node registration
    โ”œโ”€โ”€ nodes/
    โ”‚   โ”œโ”€โ”€ __init__.py
    โ”‚   โ””โ”€โ”€ maya1_tts_combined.py   # AIO node (backend)
    โ”œโ”€โ”€ js/
    โ”‚   โ”œโ”€โ”€ maya1_tts.js            # Custom canvas UI (1800+ lines)
    โ”‚   โ””โ”€โ”€ config.js               # UI config (presets, emotions, tooltips)
    โ”œโ”€โ”€ core/
    โ”‚   โ”œโ”€โ”€ model_wrapper.py        # Model loading & quantization
    โ”‚   โ”œโ”€โ”€ snac_decoder.py         # SNAC audio decoding
    โ”‚   โ””โ”€โ”€ utils.py                # Utilities & cancel support
    โ”œโ”€โ”€ resources/
    โ”‚   โ”œโ”€โ”€ emotions.txt            # 16 emotion tags
    โ”‚   โ””โ”€โ”€ prompt_examples.txt     # Voice description examples
    โ”œโ”€โ”€ pyproject.toml              # Package metadata
    โ”œโ”€โ”€ requirements.txt            # Dependencies
    โ””โ”€โ”€ README.md                   # This file
    
    </details> <details> <summary><b>ComfyUI Integration</b></summary>
    • Custom Canvas UI: Full JavaScript UI with LiteGraph.js canvas API
    • Cancel Support: Native execution.interruption_requested()
    • Progress Bars: comfy.utils.ProgressBar
    • Audio Format: ComfyUI AUDIO type (24kHz mono)
    • Model Caching: Automatic with dtype change detection
    • VRAM Management: Manual control via toggle
    • Event Handling: Document-level keyboard/mouse capture for proper text editing
    • Visual Feedback: Real-time tooltips, animations, and hover states
    </details>

    ๐Ÿ“ Credits


    ๐Ÿ“„ License

    Apache 2.0 - See LICENSE

    Maya1 model is also licensed under Apache 2.0 by Maya Research.


    ๐Ÿ”— Links


    ๐Ÿ“– Citation

    If you use Maya1 in your research, please cite:

    @misc{maya1voice2025,
      title={Maya1: Open Source Voice AI with Emotional Intelligence},
      author={Maya Research},
      year={2025},
      publisher={Hugging Face},
      howpublished={\url{https://huggingface.co/maya-research/maya1}},
    }
    

    Bringing expressive voice AI to everyone through open source.