ComfyUI Extension: HunyuanVideo-Foley Audio Generator
A ComfyUI custom node for generating synchronized audio for videos using the HunyuanVideo-Foley model.
Custom Nodes (0)
README
ComfyUI-HunyuanVideo-Foley
A ComfyUI custom node for generating synchronized audio for videos using the HunyuanVideo-Foley model.
Features
- Generate realistic sound effects synchronized with video content
- Support for both video file input and frame batch input from other ComfyUI nodes
- Flexible model selection through UI dropdowns
- Audio output for further processing in ComfyUI workflows
- Optional saving of audio and merged video files
Installation
Clone this repository into your ComfyUI custom_nodes
folder:
cd ComfyUI/custom_nodes
git clone https://github.com/railep/ComfyUI-HunyuanVideo-Foley
cd ComfyUI-HunyuanVideo-Foley
Install required dependencies:
pip install -r requirements.txt
Model Setup
Required Models
1. Diffusion Model & VAE
Download from: https://huggingface.co/tencent/HunyuanVideo-Foley
- Place
hunyuanvideo_foley.pth
inComfyUI/models/diffusion_models/
- Place
foley_vae_128d_48k.pth
inComfyUI/models/vae/
2. CLIP Vision Model (SigLIP)
Download from: https://huggingface.co/google/siglip-base-patch16-512
- Create folder
ComfyUI/models/clip_vision/siglip2-base-patch16-512/
- Download all files (
config.json
,model.safetensors
, etc.) into this folder
3. CLAP Model
Download from: https://huggingface.co/laion/clap-htsat-unfused
- Create folder
ComfyUI/models/clap/
if it doesn't exist - Create subfolder
ComfyUI/models/clap/clap-htsat-unfused/
- Download all model files into this folder
4. Synchformer Model
Download from the HunyuanVideo-Foley repository:
- Place
synchformer_state_dict.pth
inComfyUI/syncforner/
Note: Ensure the folder name matches your local setup.
Directory Structure
After setup, your directory structure should look like:
ComfyUI/
├── models/
│ ├── diffusion_models/
│ │ └── hunyuanvideo_foley.pth
│ ├── vae/
│ │ └── foley_vae_128d_48k.pth
│ ├── clip_vision/
│ │ └── siglip2-base-patch16-512/
│ │ ├── config.json
│ │ └── model.safetensors
│ └── clap/
│ └── clap-htsat-unfused/
│ ├── config.json
│ └── pytorch_model.bin
├── syncforner/
│ └── synchformer_state_dict.pth
└── custom_nodes/
└── ComfyUI-HunyuanVideo-Foley/
Usage
Node: Hunyuan Foley: Generate Audio
Inputs
prompt
: Text description for audio generationconfig_name
: Configuration file selectiondiffusion_model
: Select diffusion model from dropdownvae_model
: Select VAE model from dropdownclip_vision_model
: Select CLIP vision model from dropdownclap_model
: Select CLAP model from dropdownguidance_scale
: Control generation quality (default: 4.5)num_inference_steps
: Number of denoising steps (default: 50)save_video
: Save merged video with audiosave_audio
: Save generated audio filevideo_path
(optional): Direct path to video filevideo
(optional): Frame batch input from other nodesvideo_fps
: Frames per second for frame batch inputoutput_dir
: Output directory for saved files
Outputs
audio
: Audio tensor for further processingsample_rate
: Audio sample rate (48000 Hz)audio_wav_path
: Path to saved audio file (if saved)merged_video_path
: Path to merged video file (if saved)
Example Workflow
- Load a video using a Video Loader node.
- Connect the frame output to the video input.
- Set your audio generation prompt.
- Configure save options as needed.
- Run the generation.
Files will be saved as:
hunyuan_foley_00001.wav
hunyuan_foley_00001.mp4
with automatic numbering.
Requirements
- CUDA-capable GPU recommended (8GB+ VRAM)
- Python 3.8+
- PyTorch 2.0+
Troubleshooting
CLIP Vision Model Loading Error
If you encounter permission errors, ensure the CLIP vision folder contains all necessary files and has proper read permissions.
FFmpeg Issues
The node requires FFmpeg for video processing. Install it if not present:
- Windows: https://ffmpeg.org
- Linux:
sudo apt install ffmpeg
- Mac:
brew install ffmpeg
Credits
Based on the HunyuanVideo-Foley model by Tencent.
License
This project follows the licensing terms of the original HunyuanVideo-Foley model. Please refer to the original repository for detailed license information.