ComfyUI Extension: ComfyUI_FoleyCrafter

Authored by smthemex

Created

Updated

52 stars

FoleyCrafter is a video-to-audio generation framework which can produce realistic sound effects semantically relevant and synchronized with videos.

Custom Nodes (0)

    README

    ComfyUI_FoleyCrafter

    FoleyCrafter is a video-to-audio generation framework which can produce realistic sound effects semantically relevant and synchronized with videos.

    FoleyCrafter From: FoleyCrafter

    Update

    2024/09/06

    • add skip_timesync function from @phr00t,thanks!

    • fix "max frame" in timesync default is "150",now you can set "max frames" to "0" to get full timesync(more time need),or set "1" to timesync as fps(maybe best set)

    • 基于@phr00t 的建议,时间同步现在设置为可关闭,速度会快很多。然后max frame新增2个功能,设置为0时,读取最大值的帧数,耗时更长,设置为1时,max frame为视频的实际帧率,效果或许最好!

    • seed max changged/修改最大种子数;

    • 2024/08/22

    • 修复clip关闭的错误;节点改成字典,避免太多线了,运行离线模型失败的,请看3.2和3.3内容;

    • Fix the error of closing clip; Change the node to a dictionary to avoid too many lines. If the offline model fails to run, please refer to sections 3.2 and 3.3;

    1.Installation

    In the ./ComfyUI /custom_node directory, run the following:

    git clone https://github.com/smthemex/ComfyUI_FoleyCrafter.git
    

    2.requirements

    按理是不需要装特别的库,如果还是库缺少,请单独安装。或者打开no need requirements.txt,查看缺失的库是否在里面。
    秋叶包因为是沙箱模式,所以把缺失的库安装在系统的python库里,官方的便携包,用python -m pip install 库名。
    If the module is missing, please open "no need requirements.txt" , pip install missing module.

    可能会出现的问题,开启video_dubbing 是合成音视频,如果报错,打开控制台CMD,按以下步骤操作:
    Possible issues may arise. Enabling "video_fubbing" produces synthesized audio and video. If an error occurs, open the CMD console and follow these steps:

    python -m pip uninstall moviepy decorator
    python -m pip install moviepy decorator
    

    3 Need models

    3.1
    "ymzhang319/FoleyCrafter" link , 全部下载,并按如下结构存放在ComfyUI/models/foleycrafter 文件夹下,联外网会自动下载:
    Download all and store them in the "ComfyUI/models/foleycrafter folder" according to the following structure,online will auto download:

    └── ComfyUI/models/foleycrafter/
        ├── semantic
        │   ├── semantic_adapter.bin
        ├── vocoder
        │   ├── vocoder.pt
        │   ├── config.json
        ├── temporal_adapter.ckpt
        │   │
        └── timestamp_detector.pth.tar
    

    3.2
    online,fill "h94/IP-Adapter" link, 离线使用时,部分下载,文件结构如下,联外网会自动下载,if offline, Partial download, file structure as follows,online will auto download: 离线使用时,只需要填写:any_path 。。。。 When used offline, only need to fill in: any_path;
    虽然是随意地址,但是模型存放路径必须是models/image_encoder/(模型文件);

    └── any_path
        ├── models/image_encoder
        │   ├── model.safetensors
        │   ├── config.json
    

    3.3
    "auffusion/auffusion-full-no-adapter" link, 离线使用时,部分下载,文件结构如下,联外网会自动下载,if offline, Partial download, file structure as follows,online will auto download:
    离线使用时,只需要填写:any_path/auffusion/auffusion-full-no-adapter 。。。When used offline, only need to fill in:any_path/auffusion/auffusion-full-no-adapter;

    ├── any_path/auffusion/auffusion-full-no-adapter
    |      ├──model_index.json
    |      ├──vae
    |          ├── config.json
    |          ├── diffusion_pytorch_model.bin
    |      ├──unet
    |          ├── config.json
    |          ├── diffusion_pytorch_model.bin
    |      ├──tokenizer
    |          ├── merges.txt
    |          ├── special_tokens_map.json
    |          ├── tokenizer_config.json
    |          ├── vocab.json
    |      ├── text_encoder
    |          ├── config.json
    |          ├── pytorch_model.bin  
    |      ├── scheduler
    |          ├── scheduler_config.json
    |      ├──feature_extractor
    |          ├──preprocessor_config.json
    |      ├──vocoder
    |          ├──config.json
    |          ├──vocoder.pt
    

    4 Example

    video_dubbing using prompt and negative_prompt (Latest version)

    5 Function Description of Nodes

    --semantic_scale :ip adatpter scale 调整音频的跟视频的相似度;
    --max_frame :audio length 音频对齐间隔,为0时全选,为1时间隔数为fps,目前设置500;
    --controlnet_scale: another audio sacle 未测试;
    --sample_width/sample_width: weights type, don't change it 模型尺寸,不要动,这个跟视频长宽无关;
    --video_dubbing: save or not(if using example) 是否用内置的音视频合成,如果用示例的VH合成,可以关闭。

    6 Citation

    "open-mmlab/FoleyCrafter"

    @misc{zhang2024pia,
      title={FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds},
      author={Yiming Zhang, Yicheng Gu, Yanhong Zeng, Zhening Xing, Yuancheng Wang, Zhizheng Wu, Kai Chen},
      year={2024},
      eprint={2407.01494},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
    }