ComfyUI Extension: HunyuanVideo-Foley Audio Generator
ComfyUI wrapper nodes for HunyuanVideo-Foley: Generate audio from video + text prompts
Custom Nodes (0)
README
ComfyUI-HunyuanVideoFoley
English | įŽäŊ䏿
A ComfyUI custom node integrating the inference pipeline of HunyuanVideo-Foley, enabling audio generation from video + text prompts, plus audio-video merging utilities.
Features
- đĩ Auto model download on first use (from Hugging Face)
- đŦ Video-aware audio generation guided by text prompts
- đ§ Audio-video merging to produce a final video with an audio track
Nodes
1) HunyuanVideo-Foley Generate Audio
Generate an AUDIO tensor from a VIDEO input and a text prompt.
Inputs:
video: VIDEO (LoadVideo / CreateVideo output)prompt: Text prompt describing the desired audiomodel: Model setup (auto-detected undermodels/hunyuan_foley)guidance_scale: Float, default 4.5num_inference_steps: Int, default 50device: auto/cpu/cuda/mpsgpu_id: CUDA device id
Output:
audio:{ "waveform": [B=1, C=1, T], "sample_rate": int }
2) Video Audio Merger
Merge AUDIO and VIDEO into a single video file with an audio track.
Inputs:
video: VIDEO inputaudio: AUDIO inputaudio_sync_mode:stretch|loop|truncate|pad_silence
video_codec:copy|libx264|libx265audio_codec:aac|mp3|copyquality:high|medium|low
Output:
merged_video: VIDEO
Quick Start
LoadVideo â HunyuanVideoFoleyGenerateAudio â VideoAudioMerger â SaveVideo
â â
text prompt audio input
- Load a video (LoadVideo)
- Generate audio via "HunyuanVideo-Foley Generate Audio"
- Merge audio + video via "Video Audio Merger"
- Save or preview the final video
Dependencies
# Use your ComfyUI Python environment
D:\AI\ComfyUI-aki-v1.3\.ext\python.exe -m pip install ffmpeg-python
Notes
- On first run, models will be downloaded automatically (internet required)
- FFmpeg is required for audio/video processing
- CUDA is recommended for best performance