FoleyCrafter is a video-to-audio generation framework which can produce realistic sound effects semantically relevant and synchronized with videos.
FoleyCrafter is a video-to-audio generation framework which can produce realistic sound effects semantically relevant and synchronized with videos.
FoleyCrafter From: FoleyCrafter
2024/09/06
add skip_timesync function from @phr00t,thanks!
fix "max frame" in timesync default is "150",now you can set "max frames" to "0" to get full timesync(more time need),or set "1" to timesync as fps(maybe best set)
基于@phr00t 的建议,时间同步现在设置为可关闭,速度会快很多。然后max frame新增2个功能,设置为0时,读取最大值的帧数,耗时更长,设置为1时,max frame为视频的实际帧率,效果或许最好!
seed max changged/修改最大种子数;
2024/08/22
修复clip关闭的错误;节点改成字典,避免太多线了,运行离线模型失败的,请看3.2和3.3内容;
Fix the error of closing clip; Change the node to a dictionary to avoid too many lines. If the offline model fails to run, please refer to sections 3.2 and 3.3;
In the ./ComfyUI /custom_node directory, run the following:
git clone https://github.com/smthemex/ComfyUI_FoleyCrafter.git
按理是不需要装特别的库,如果还是库缺少,请单独安装。或者打开no need requirements.txt,查看缺失的库是否在里面。
秋叶包因为是沙箱模式,所以把缺失的库安装在系统的python库里,官方的便携包,用python -m pip install 库名。
If the module is missing, please open "no need requirements.txt" , pip install missing module.
可能会出现的问题,开启video_dubbing 是合成音视频,如果报错,打开控制台CMD,按以下步骤操作:
Possible issues may arise. Enabling "video_fubbing" produces synthesized audio and video. If an error occurs, open the CMD console and follow these steps:
python -m pip uninstall moviepy decorator
python -m pip install moviepy decorator
3.1
"ymzhang319/FoleyCrafter" link ,
全部下载,并按如下结构存放在ComfyUI/models/foleycrafter 文件夹下,联外网会自动下载:
Download all and store them in the "ComfyUI/models/foleycrafter folder" according to the following structure,online will auto download:
└── ComfyUI/models/foleycrafter/
├── semantic
│ ├── semantic_adapter.bin
├── vocoder
│ ├── vocoder.pt
│ ├── config.json
├── temporal_adapter.ckpt
│ │
└── timestamp_detector.pth.tar
3.2
online,fill "h94/IP-Adapter" link,
离线使用时,部分下载,文件结构如下,联外网会自动下载,if offline, Partial download, file structure as follows,online will auto download:
离线使用时,只需要填写:any_path 。。。。 When used offline, only need to fill in: any_path;
虽然是随意地址,但是模型存放路径必须是models/image_encoder/(模型文件);
└── any_path
├── models/image_encoder
│ ├── model.safetensors
│ ├── config.json
3.3
"auffusion/auffusion-full-no-adapter" link,
离线使用时,部分下载,文件结构如下,联外网会自动下载,if offline, Partial download, file structure as follows,online will auto download:
离线使用时,只需要填写:any_path/auffusion/auffusion-full-no-adapter 。。。When used offline, only need to fill in:any_path/auffusion/auffusion-full-no-adapter;
├── any_path/auffusion/auffusion-full-no-adapter
| ├──model_index.json
| ├──vae
| ├── config.json
| ├── diffusion_pytorch_model.bin
| ├──unet
| ├── config.json
| ├── diffusion_pytorch_model.bin
| ├──tokenizer
| ├── merges.txt
| ├── special_tokens_map.json
| ├── tokenizer_config.json
| ├── vocab.json
| ├── text_encoder
| ├── config.json
| ├── pytorch_model.bin
| ├── scheduler
| ├── scheduler_config.json
| ├──feature_extractor
| ├──preprocessor_config.json
| ├──vocoder
| ├──config.json
| ├──vocoder.pt
video_dubbing using prompt and negative_prompt (Latest version)
--semantic_scale :ip adatpter scale 调整音频的跟视频的相似度;
--max_frame :audio length 音频对齐间隔,为0时全选,为1时间隔数为fps,目前设置500;
--controlnet_scale: another audio sacle 未测试;
--sample_width/sample_width: weights type, don't change it 模型尺寸,不要动,这个跟视频长宽无关;
--video_dubbing: save or not(if using example) 是否用内置的音视频合成,如果用示例的VH合成,可以关闭。
"open-mmlab/FoleyCrafter"
@misc{zhang2024pia,
title={FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds},
author={Yiming Zhang, Yicheng Gu, Yanhong Zeng, Zhening Xing, Yuancheng Wang, Zhizheng Wu, Kai Chen},
year={2024},
eprint={2407.01494},
archivePrefix={arXiv},
primaryClass={cs.CV}
}