ComfyUI Extension: ComfyUI-MusicGen-HF
A standalone ComfyUI custom node package for Facebook's MusicGen using Hugging Face Transformers. Generate high-quality music from text prompts with full support for CUDA, MPS (Apple Silicon), and CPU.
Custom Nodes (0)
README
ComfyUI MusicGen (Hugging Face)
A standalone ComfyUI custom node package for Facebook's MusicGen using Hugging Face Transformers. Generate high-quality music from text prompts with full support for CUDA, MPS (Apple Silicon), and CPU.
✨ Features
- 🎵 Text-to-Music Generation - Create music from natural language descriptions
- 🎛️ Multiple Model Sizes - Choose between small, medium, and large models
- 🍎 Apple Silicon Support - Full MPS acceleration for M1/M2/M3 Macs
- 🚀 Multi-Device Support - Automatic device detection (CUDA/MPS/CPU)
- 🎚️ Advanced Controls - Guidance scale, sampling, temperature controls
- 🎼 Musical Timing Control - BPM-based duration with beat precision
- 🔄 Audio Preview & Looping - Preview with looping and queue management
- 🎧 Smooth Audio Transitions - Crossfading and seamless audio playback
- 📁 Multi-Format Export - Save as WAV, FLAC, MP3, or Opus with quality settings
- 🔧 Standalone Operation - No dependencies on ComfyUI core modifications
- 🎧 Built-in Audio Utilities - Complete audio loading and saving capabilities
🚀 Installation
Option 1: Clone from GitHub
cd /path/to/ComfyUI/custom_nodes/
git clone https://github.com/your-username/ComfyUI-MusicGen-HF.git
cd ComfyUI-MusicGen-HF
pip install -r requirements.txt
Option 2: Manual Installation
-
Navigate to your ComfyUI custom nodes directory:
cd /path/to/ComfyUI/custom_nodes/
-
Create the directory:
mkdir ComfyUI-MusicGen-HF cd ComfyUI-MusicGen-HF
-
Install dependencies:
pip install transformers>=4.30.0 accelerate>=0.20.0 scipy>=1.10.0 torch>=2.0.0 torchaudio>=2.0.0 av>=15.0.0
-
Add the node files (copy
nodes.py
,__init__.py
, andrequirements.txt
) -
Restart ComfyUI
Note: This is a completely standalone package - no modifications to ComfyUI core files are required!
🎵 Usage
Available Nodes
Core Generation
- MusicGen (Hugging Face) - Main text-to-music generation node with conditioning support
- BPM Duration Calculator - Musical timing control with BPM and beat precision
Audio Processing & Preview
- Looping Audio Preview - Preview audio with looping and queue management
- Smooth Audio Queue - Crossfading and seamless audio transitions with precise timing
Audio I/O
- Save Audio (WAV/FLAC/MP3/Opus) - Advanced audio saver with format options
- Load Audio File - Load audio files for conditioning or processing
- Save MusicGen Audio (Legacy) - Simplified audio saver for basic use
🔄 Node System Overview
flowchart TD
A[Text Prompt] --> B[MusicGen HF]
C[BPM Duration Calculator] --> B
D[Load Audio File] --> E[Looping Audio Preview]
E --> F[Smooth Audio Queue]
E --> B
B --> G[Save Audio]
F --> H[Audio Preview/Playback]
B --> I[Generated Audio]
subgraph "Musical Timing"
C
J[BPM: 120.0]
K[Beats: 16.0]
L[Time Signature: 4/4]
J --> C
K --> C
L --> C
end
subgraph "Audio Processing"
E
F
M[Loop Count: 3x]
N[Queue Mode: ON/OFF]
O[Crossfade: 0.5s]
P[Target Duration]
M --> E
N --> E
O --> F
P --> F
end
subgraph "Generation Engine"
B
Q[Model: small/medium/large]
R[Guidance Scale: 3.0]
S[Temperature: 1.0]
T[Conditioning Audio]
Q --> B
R --> B
S --> B
T --> B
end
style B fill:#ff9999
style C fill:#99ccff
style E fill:#99ff99
style F fill:#ffcc99
Workflow Patterns
1. Basic Text-to-Music
Text Prompt → MusicGen → Save Audio
2. Musical Timing Control
BPM Calculator → MusicGen → Save Audio
↑
(120 BPM, 16 beats, 4/4 time)
3. Audio Conditioning with Preview
Load Audio → Looping Preview → MusicGen → Save Audio
↓
conditioning_audio
4. Smooth Playback System
Generated Audio → Smooth Audio Queue → Preview/Playback
↑
(Crossfade, Precise Timing)
Audio Conditioning (Optional)
-
Load Reference Audio:
- Add
Load Audio File
node - Specify path to reference audio file
- Connect to MusicGen's
conditioning_audio
input
- Add
-
Generate Continuation:
- The model will generate music that continues/extends the reference audio
Model Sizes
- Small (~1.5GB): Fast generation, good quality
- Medium (~3.3GB): Better quality, slower generation
- Large (~3.3GB): Best quality, slowest generation
Device Support
The node automatically detects and uses the best available device:
- CUDA (NVIDIA GPUs): Full acceleration with mixed precision
- MPS (Apple Silicon): Optimized for M1/M2/M3 chips
- CPU: Fallback for compatibility
Audio Format Options
| Format | Quality Settings | Best For | |--------|------------------|----------| | WAV | Uncompressed | Highest quality, large files | | FLAC | Lossless | High quality, smaller than WAV | | MP3 | 128k, 192k, 320k, V0 | Good compression, wide compatibility | | Opus | 64k, 96k, 128k, 192k, 320k | Best compression, modern standard |
📚 Node Documentation
MusicGen (Hugging Face)
Main audio generation engine with enhanced conditioning support.
Parameters:
- prompt: Text description of the music to generate
- model_size: small/medium/large model variants
- duration: Length of generated audio (1-30 seconds)
- guidance_scale: Higher values follow prompt more closely (1-10)
- do_sample: Enable sampling for more diverse outputs
- max_new_tokens: Maximum tokens to generate (50-1503)
- seed: Random seed for reproducible generation
- temperature: Sampling randomness (0.1-2.0)
- conditioning_audio: Optional reference audio for continuation
- duration_override: Duration input from BPM Calculator (0 = use manual duration)
BPM Duration Calculator
Converts musical timing (BPM + beats) to precise duration for MusicGen.
Parameters:
- bpm: Beats per minute (60.0-200.0, step: 0.1)
- beats: Number of beats to generate (1.0-64.0, step: 0.25)
- time_signature: Musical time signature (4/4, 3/4, 6/8, 2/4)
Outputs:
- duration: Calculated duration in seconds
- info: Musical timing information and token calculations
Usage Example:
- 120 BPM × 16 beats = 8.0 seconds
- Automatically caps duration to respect 1503 token limit
- Provides precise musical timing for consistent generation
Looping Audio Preview
Enhanced audio preview with looping and intelligent queue management.
Parameters:
- audio: Input audio to preview
- loop_count: Number of preview loops (1-10)
- enable_preview: Toggle preview playback
- queue_mode: Wait for current audio to finish before playing new audio
- instance_id: Unique identifier for multiple preview nodes
Outputs:
- looped_preview: Audio repeated for extended preview
- conditioning_audio: Pass-through for MusicGen conditioning
- info: Queue status, timing, and preview information
Smooth Audio Queue
Advanced audio processing with crossfading and precise timing control.
Parameters:
- audio: Input audio to process
- target_duration: Precise duration target (0 = use original, step: 0.01)
- crossfade_duration: Smooth transition time between clips (0-2.0s)
- queue_id: Identifier for multiple audio streams
- auto_play: Automatically handle transitions when audio changes
Outputs:
- queued_audio: Processed audio with smooth transitions
- conditioning_audio: Original audio for MusicGen
- info: Processing status, timing, and queue information
Features:
- Duration Validation: Rounds to token boundaries (~0.02s) for optimal generation
- Post-processing: Trims or pads audio to exact target duration
- Crossfading: Creates smooth transitions between audio clips
- Queue Management: Handles multiple audio clips with seamless playback
🎼 Advanced Workflow Examples
1. Musical Loop Creation
BPM Calculator (120 BPM, 8 beats, 4/4) → MusicGen → Looping Preview (3x loops)
↓
Smooth Queue → Final Audio
Perfect for creating seamless musical loops with precise timing.
2. Audio Continuation Chain
Load Audio → Looping Preview → MusicGen → Smooth Queue → MusicGen (next) → Save
↓ ↓ ↓ ↓
conditioning preview generation_1 transition generation_2
Create extended compositions by chaining generations with smooth transitions.
3. Multi-Track Preview System
Track 1: Load Audio → Looping Preview (instance: "track1") → Preview 1
Track 2: Generated → Looping Preview (instance: "track2") → Preview 2
Track 3: BPM Calc → MusicGen → Looping Preview (instance: "track3") → Preview 3
Preview multiple audio sources simultaneously with independent queue management.
4. Precise Duration Matching
Target Duration: 7.583s
↓
Audio Source → Smooth Queue (target_duration: 7.583) → Exact 7.583s Output
Create audio clips with sample-accurate duration for synchronization.
5. Crossfade DJ-Style Mixing
Track A → Smooth Queue (crossfade: 2.0s, queue_id: "mix") → Mixed Output
Track B → Smooth Queue (crossfade: 2.0s, queue_id: "mix") → Mixed Output
Create smooth DJ-style transitions between different audio sources.
Example Prompts
Musical Styles
- "80s pop track with bassy drums and synth"
- "classical piano piece in minor key"
- "upbeat jazz with saxophone solo"
- "ambient electronic soundscape"
- "acoustic guitar fingerpicking folk song"
Genre-Specific with BPM
- "house music at 128 BPM with four-on-the-floor kick"
- "slow blues at 70 BPM with guitar and harmonica"
- "drum and bass at 174 BPM with heavy sub bass"
- "waltz at 180 BPM in 3/4 time signature"
Mood and Atmosphere
- "melancholic indie folk with reverb-heavy vocals"
- "energetic punk rock with distorted guitars"
- "peaceful nature sounds with gentle piano"
- "epic orchestral soundtrack with brass section"
Troubleshooting
MPS Issues
If you encounter MPS errors:
- The node will automatically fallback to CPU
- Ensure macOS 12.3+ and PyTorch 2.0+
Memory Issues
- Start with "small" model
- Reduce
max_new_tokens
- Lower
duration
- Disable preview loops if memory constrained
Model Download
- First run downloads models to
~/.cache/huggingface/
- Ensure stable internet connection
- Models are cached for subsequent use
Node-Specific Issues
BPM Duration Calculator
- Duration too long: Automatically capped at token limit (30.06s max)
- Precision issues: Durations rounded to ~0.02s token boundaries
- Time signature: Only affects measure calculations, not duration
Looping Audio Preview
- Queue not working: Check
instance_id
is unique for multiple nodes - Audio not looping: Verify
loop_count > 1
andenable_preview = True
- Memory usage: High loop counts create large audio tensors
Smooth Audio Queue
- Crossfade artifacts: Reduce
crossfade_duration
for short audio clips - Timing drift: Use
target_duration
for precise synchronization - Queue conflicts: Use different
queue_id
for independent streams
📋 Requirements
System Requirements
- Python: 3.8+
- RAM: 8GB+ recommended (4GB minimum, 12GB+ for smooth queue processing)
- Storage: 2-4GB for model cache + additional space for audio processing
- GPU: Optional but recommended (CUDA/MPS support)
- Audio: System audio output for preview functionality
Dependencies
All dependencies are automatically installed via requirements.txt
:
torch>=2.0.0
- Deep learning frameworktorchaudio>=2.0.0
- Audio processingtransformers>=4.30.0
- Hugging Face modelsaccelerate>=0.20.0
- Model optimizationscipy>=1.10.0
- Scientific computingav>=15.0.0
- Audio/video processing
Compatibility
- ✅ Windows (CUDA/CPU)
- ✅ macOS (MPS/CPU) - Intel and Apple Silicon
- ✅ Linux (CUDA/CPU)
- ✅ ComfyUI - All recent versions
🆕 Version History
v1.1.0 (Latest)
- ✅ BPM Duration Calculator - Musical timing with beat precision
- ✅ Looping Audio Preview - Enhanced preview with queue management
- ✅ Smooth Audio Queue - Crossfading and precise timing control
- ✅ Enhanced MusicGen - Duration override and improved conditioning
- ✅ Advanced Workflows - Multi-node audio processing pipelines
v1.0.0
- ✅ Basic MusicGen text-to-audio generation
- ✅ Multi-format audio export (WAV/FLAC/MP3/Opus)
- ✅ Device optimization (CUDA/MPS/CPU)
- ✅ Audio loading and conditioning support
🔗 Links
🎯 Quick Start Guide
- Install: Clone repo +
pip install -r requirements.txt
- Basic: Text → MusicGen → Save Audio
- Musical: BPM Calculator → MusicGen → Save Audio
- Preview: Audio → Looping Preview → MusicGen
- Advanced: Audio → Smooth Queue → Crossfaded Output
Ready to create music? Start with the Basic workflow and explore the advanced features! 🎵