ComfyUI Extension: ComfyUI_EchoMimic

Authored by smthemex

Created

Updated

449 stars

You can using a/EchoMimic in comfyui,whitch Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

Custom Nodes (0)

    README

    ComfyUI_EchoMimic

    You can use EchoMimic & EchoMimic V2 in comfyui

    Echomimic:Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
    Echomimic_v2: Towards Striking, Simplified, and Semi-Body Human Animation


    New Updates:

    • 新增输入图片跟基准图片对齐功能(选择pose_normal_sapiens时自动开启,3种驱动方式都能使用,见下面的示例图),修复之前的蒙版对齐错误。

    • Added the function of aligning the input image with the reference image (automatically turned on when selecting pose_normal_sapiens, all three driving methods can be used,See the example diagram below), fixed the previous mask alignment error.

    • V2版现在跟V1一样,有三种pose驱动方式,第一种,infer_mode选择audio_drive,pose_dir 选择列表里的几个默认pose,则使用默认的npy pose文件,第二种,infer_mode选择audio_drive,pose_dir 选择已有的npy文件夹(位于...ComfyUI/input/tensorrt_lite目录下),第三种,infer_mode选择pose_normal_dwpose 或pose_normal_sapiens,video_images连接视频入口,确认...ComfyUI/models/echo_mimic 下有yolov8m.pt 和sapiens_1b_goliath_best_goliath_AP_639_torchscript.pt2 模型 (见图示和example里的工作流,下载地址见后附);

    • 因为调用了sapiens的pose方法,所以需要安装yolo的库ultralytics ,安装方法: pip install ultralytics

    • The V2 version now has three different pose driving methods, just like the V1 version. The first method is to select audio_drive for infer_mode and default poses from the list for pose_dir, using the default npy pose file. The second method is to select audio-drive for infer_mode and an existing npy folder (located in the... ComfyUI/input/tensorrt_lite directory) for pose_dir. The third method is to select 'pose_normal_dwpose' or 'pose_normal_sapiens' for infer_mode, connect to the video portal with video_images, and confirm Under ComfyUI/models/echo_mimic, there are 'YOLOV8m.pt' and 'sapiens_1b_goliath_best_goliath_AP_639_torchscript.pt2' models (see the workflow in the diagram and example,Please see the download link below)

    • Because the pose method of ‘Sapiens’ was called, it is necessary to install YOLO's library ultralytics. Installation method: pip install ultralytics


    1. Installation

    In the ./ComfyUI /custom_node directory, run the following:

    git clone https://github.com/smthemex/ComfyUI_EchoMimic.git
    

    2. Requirements

    pip install -r requirements.txt
    pip install --no-deps facenet-pytorch
    

    Notice

    • 如果安装facenet-pytorch后comfyUI奔溃,可以先卸载torch,然后再重新安装,以下版本只是示例:
    • if comfyUI broken after pip install facenet-pytorch ,try this below:
    pip uninstall torchaudio torchvision torch xformers
    pip install torch torchvision torchaudio --index-url  https://download.pytorch.org/whl/cu124
    pip install xformers
    
    • 如果使用的是便携包版本在python_embeded目录下 打开CMD ;
    • If it is a portable package comfyUI: open CMD in python_embeded dir
    python -m pip uninstall torchaudio torchvision torch xformers
    python -m pip install torch torchvision torchaudio --index-url  https://download.pytorch.org/whl/cu124
    python -m pip install xformers
    
    • 如果ffmpeg 报错,if ffmpeg error:
    pip uninstall ffmpeg   
    pip install ffmpeg-python  
    
    • 其他库缺啥装啥。。。
    • If the module is missing, , pip install missing module.

    Troubleshooting errors with stable-audio-tools / other audio issues

    If using conda & python >3.12

    Uninstall all & downgrade python

    pip uninstall torchaudio torchvision torch xformers ffmpeg
    
    conda uninstall python
    conda install python=3.11.9
    
    pip install --upgrade pip wheel
    conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia
    or install torch 2.4 
    conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia
    

    Should have most of these packages if you install the custom nodes from git urls

    pip install flash-attn spandrel opencv-python diffusers jwt diffusers bitsandbytes omegaconf decord carvekit insightface easydict open_clip ffmpeg-python taming onnxruntime
    

    3. Models Required


    • 3.1 V1 & V2 Shared model v1 和 v2 共用的模型:
      如果能直连抱脸,点击就会自动下载所需模型,不需要手动下载.
      3.11 unet link
      3.12 V1 & V2 audio link
      3.13 vae(stabilityai/sd-vae-ft-mse) link
      3.14 optional (可选) hallo upscale huggingface # auto downlad
    ├── ComfyUI/models/ echo_mimic
    |         ├── unet
    |             ├── diffusion_pytorch_model.bin
    |             ├── config.json
    |         ├── audio_processor
    |             ├── whisper_tiny.pt
    |         ├── vae
    |             ├── diffusion_pytorch_model.safetensors
    |             ├── config.json
    
    
    • 3.2 V1 models V1使用以下模型:
      V1 address link
      Audio-Drived Algo Inference 音频驱动
    ├── ComfyUI/models/echo_mimic
    |         ├── denoising_unet.pth
    |         ├── face_locator.pth
    |         ├── motion_module.pth
    |         ├── reference_unet.pth
    

    Audio-Drived Algo Inference acc 音频驱动加速版

    ├── ComfyUI/models/echo_mimic
    |         ├── denoising_unet_acc.pth
    |         ├── face_locator.pth
    |         ├── motion_module_acc.pth
    |         ├── reference_unet.pth
    

    Using Pose-Drived Algo Inference 姿态驱动

    ├── ComfyUI/models/echo_mimic
    |         ├── denoising_unet_pose.pth
    |         ├── face_locator_pose.pth
    |         ├── motion_module_pose.pth
    |         ├── reference_unet_pose.pth
    

    Using Pose-Drived Algo Inference ACC 姿态驱动加速版

    ├── ComfyUI/models/echo_mimic
    |         ├── denoising_unet_pose_acc.pth
    |         ├── face_locator_pose.pth
    |         ├── motion_module_pose_acc.pth
    |         ├── reference_unet_pose.pth
    

    3.2 v2 version
    use model below V2, Automatic download, you can manually add it 使用以下模型,使用及自动下载,你可以手动添加:
    模型地址address:huggingface

    ├── ComfyUI/models/echo_mimic/v2
    |         ├── denoising_unet.pth
    |         ├── motion_module.pth
    |         ├── pose_encoder.pth
    |         ├── reference_unet.pth
    

    YOLOm8 download link
    sapiens pose download link
    sapiens的pose 模型可以量化为fp16的,详细见我的sapiens插件 地址

    ├── ComfyUI/models/echo_mimic
    |         ├── yolov8m.pt
    |         ├── sapiens_1b_goliath_best_goliath_AP_639_torchscript.pt2  or/或者 sapiens_1b_goliath_best_goliath_AP_639_torchscript_fp16.pt2
    

    4 Example


    • 自动对齐输入图片Automatically align input images;

    • V2加载自定义视频驱动视频,V2 loads custom video driver videos

    • V2选择自定义pose驱动视频,V2 Choose Custom Pose Driver Video

    • Echomimic_v2 use default pose new version 使用官方默认的pose文件

    • motion_sync Extract facial features directly from the video (with the option of voice synchronization), while generating a PKL model for the reference video ,The old version 直接从从视频中提取面部特征(可以选择声音同步),同时生成参考视频的pkl模型 旧版

    • mormal Audio-Drived Algo Inference The old version workflow 音频驱动视频常规示例 旧版

    • mormal Audio-Drived Algo Inference The old version workflow 音频驱动视频常规示例 2倍放大 1024*1024 旧版本

    • pose from pkl,The old version, 基于预生成的pkl模型生成视频. 旧版

    • 示例的 VH node ComfyUI-VideoHelperSuite node: ComfyUI-VideoHelperSuite


    5 Function Description


    --infer_mode:音频驱动视频生成,“audio_drived” 和"audio_drived_acc";
    --infer_mode:参考pkl模型文件视频pose生成 "pose_normal", "pose_acc";
    ----motion_sync:如果打开且video_file有视频文件时,生成pkl文件,并生成参考视频的视频;pkl文件在input\tensorrt_lite 目录下,再次使用需要重启comfyUI。
    ----motion_sync:如果关闭且pose_mode不为none的时候,读取选定的pose_mode目录名的pkl文件,生成pose视频;如果pose_mode为空的时候,生成基于默认assets\test_pose_demo_pose的视频

    特别的选项
    --save_video:如果不想使用VH节点时,可以开启,默认关闭;
    --draw_mouse:你可以试试;
    --length:帧数,时长等于length/fps;
    --acc模型 ,6步就可以,但是质量略有下降;
    --lowvram :低显存用户可以开启 lowvram users can enable it
    --内置内置图片等比例裁切。
    特别注意的地方
    --cfg数值设置为1,仅在turbo模式有效,其他会报错。

    Infir_mode: Audio driven video generation, "audio-d rived" and "audio-d rived_acc";
    Infer_rode: Refer to the PKL model file to generate "pose_normal" and "pose_acc" for the video pose;
    Motion_Sync: If opened and there is a video file in videoFILE, generate a pkl file and generate a reference video for the video; The pkl file is located in the input \ sensorrt_lite directory. To use it again, you need to restart ComfyUI.
    Motion_Sync: If turned off and pose_mode is not 'none', read the pkl file of the selected pose_mode directory name and generate a pose video; If pose_mode is empty, generate a video based on the default assets \ test_pose_demo_pose

    Special options:
    --Save_video: If you do not want to use VH nodes, it can be turned on and turned off by default;
    --Draw_mause: You can try it out;
    --Length: frame rate, duration equal to length/fps;
    --The ACC model only requires 6 steps, but the quality has slightly decreased;
    --Built in image proportional cropping.
    Special attention should be paid to:
    --The cfg value is set to 1, which is only valid in turbo mode, otherwise an error will be reported.


    既往更新:

    • 增加detection_Resnet50_Final.pth 和RealESRGAN_x2plus.pth自动下载的代码,首次使用,保持realesrgan和face_detection_model菜单为‘none’(无)时就会自动下载,如果菜单里已有模型,请选择模型。
    • 新增hallo2的2倍放大节点,输入视频的尺寸必须是512 * 512方形,输出为1024 * 1024
    • 当你用torch 2.2.0+cuda 成功安装最新的facenet-pytorch库后,可以卸载掉基于 2.2.0版本的torch torchvision torchaudio xformers 然后重新安装更高版本的torch torchvision torchaudio xformers,以下是卸载和安装的示例(假设安装torch2.4):
    • 添加lowvram模式,方便6G或者8G显存用户使用,注意,开启之后会很慢,而且占用内存较大,请谨慎尝试。
    • 修改vae模型的加载方式,移至ComfyUI/models/echo_mimic/vae路径(详细见下方模型存放地址指示图),降低hf加载模型的优先级,适用于无梯子用户。

    Previous updates:

    • The magnification factor of 'facecrop-ratio' is '1/facecrop-ratio'. If set to 0.5, the face will be magnified twice. It is recommended to adjust facecrop-ratio to a smaller value only when the proportion of faces in the reference image or driving video is very small,Do not cut when it is 1 or 0;

    • facecrop_ratio的放大系数为1/facecrop_ratio,如果设置为0.5,面部会得到2倍的放大,建议只在参考图片或者驱动视频中的人脸占比很小的时候,才将facecrop_ratio调整为较小的值.为1 或者0 时不裁切

    • Add upscale model and Resnet model auto download codes(if had ,they in comfyUI/models/upscale_models/RealESRGAN_x2plus.pth and comfyUI/models/Hallo/facelib/detection_Resnet50_Final.pth), first use ,keep “realesrgan” and “face_detection_model” ‘none’ will auto download..

    • After successfully installing the latest ‘facenet-pytorch’ library using torch 2.2.0+CUDA, you can uninstall torch torch vision torch audio xformers based on version 2.2.0 and then reinstall a higher version of torch、 torch vision、 torch audio xformers. Here is an example of uninstallation and installation (installing torch 2.4):

    • Add lowvram mode for convenient use by 6G or 8G video memory users. Please note that it will be slow and consume a large amount of memory when turned on. Please try carefully


    6 Citation

    EchoMimici

    @misc{chen2024echomimic,
      title={EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning},
      author={Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma},
      year={2024},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
    }
    

    EchoMimici-V2

    @misc{meng2024echomimic,
      title={EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation},
      author={Rang Meng, Xingyu Zhang, Yuming Li, Chenguang Ma},
      year={2024},
      eprint={2411.10061},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
    }
    

    hallo2

    @misc{cui2024hallo2,
    	title={Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation},
    	author={Jiahao Cui and Hui Li and Yao Yao and Hao Zhu and Hanlin Shang and Kaihui Cheng and Hang Zhou and Siyu Zhu and️ Jingdong Wang},
    	year={2024},
    	eprint={2410.07718},
    	archivePrefix={arXiv},
    	primaryClass={cs.CV}
    }
    

    sapiens

    @article{khirodkar2024sapiens,
      title={Sapiens: Foundation for Human Vision Models},
      author={Khirodkar, Rawal and Bagautdinov, Timur and Martinez, Julieta and Zhaoen, Su and James, Austin and Selednik, Peter and Anderson, Stuart and Saito, Shunsuke},
      journal={arXiv preprint arXiv:2408.12569},
      year={2024}
    }