You can using EchoMimic in comfyui,please using pip install install miss module
You can use EchoMimic & EchoMimic V2 in comfyui
Echomimic:Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
Echomimic_v2: Towards Striking, Simplified, and Semi-Body Human Animation
In the ./ComfyUI /custom_node directory, run the following:
git clone https://github.com/smthemex/ComfyUI_EchoMimic.git
pip install -r requirements.txt
If use v1 version 如果要使用V1版本:
pip install --no-deps facenet-pytorch
pip uninstall torchaudio torchvision torch xformers
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
pip install xformers
python -m pip uninstall torchaudio torchvision torch xformers
python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
python -m pip install xformers
pip uninstall ffmpeg
pip install ffmpeg-python
If using conda & python >3.12
Uninstall all & downgrade python
pip uninstall torchaudio torchvision torch xformers ffmpeg
conda uninstall python
conda install python=3.11.9
pip install --upgrade pip wheel
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia
or install torch 2.4
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia
Should have most of these packages if you install the custom nodes from git urls
pip install flash-attn spandrel opencv-python diffusers jwt diffusers bitsandbytes omegaconf decord carvekit insightface easydict open_clip ffmpeg-python taming onnxruntime
3.1 V1 & V2 Shared model v1 和 v2 共用的模型:
如果能直连抱脸,点击就会自动下载所需模型,不需要手动下载.
├── ComfyUI/models/ echo_mimic
| ├── unet
| ├── diffusion_pytorch_model.bin
| ├── config.json
| ├── audio_processor
| ├── whisper_tiny.pt
├── ComfyUI/models/vae
| ├── diffusion_pytorch_model.safetensors or rename sd-vae-ft-mse.safetensors
3.2 V1 models V1使用以下模型:
├── ComfyUI/models/echo_mimic
| ├── denoising_unet.pth
| ├── face_locator.pth
| ├── motion_module.pth
| ├── reference_unet.pth
Audio-Drived Algo Inference acc 音频驱动加速版
| ├── denoising_unet_acc.pth
| ├── motion_module_acc.pth
├── ComfyUI/models/echo_mimic
| ├── denoising_unet_pose.pth
| ├── face_locator_pose.pth
| ├── motion_module_pose.pth
| ├── reference_unet_pose.pth
Using Pose-Drived Algo Inference ACC 姿态驱动加速版
| ├── denoising_unet_pose_acc.pth
| ├── motion_module_pose_acc.pth
3.2 v2 version
use model below V2, Automatic download, you can manually add it 使用以下模型,使用及自动下载,你可以手动添加:
模型地址address:huggingface
├── ComfyUI/models/echo_mimic/v2
| ├── denoising_unet.pth
| ├── motion_module.pth
| ├── pose_encoder.pth
| ├── reference_unet.pth
if use acc 姿态驱动加速版
| ├── denoising_unet_acc.pth
| ├── motion_module_acc.pth
YOLOm8 download link
sapiens pose download link
sapiens的pose 模型可以量化为fp16的,详细见我的sapiens插件 地址
├── ComfyUI/models/echo_mimic
| ├── yolov8m.pt
| ├── sapiens_1b_goliath_best_goliath_AP_639_torchscript.pt2 or/或者 sapiens_1b_goliath_best_goliath_AP_639_torchscript_fp16.pt2
V2加载自定义视频驱动视频,V2 loads custom video driver videos
<img src="https://github.com/smthemex/ComfyUI_EchoMimic/blob/main/example/example.png" width="60%">V2选择自定义pose驱动视频,V2 Choose Custom Pose Driver Video
<img src="https://github.com/smthemex/ComfyUI_EchoMimic/blob/main/example/cropC.png" width="60%">Echomimic_v2 use default pose new version 使用官方默认的pose文件
<img src="https://github.com/smthemex/ComfyUI_EchoMimic/blob/main/example/v2.gif" width="60%">motion_sync Extract facial features directly from the video (with the option of voice synchronization), while generating a PKL model for the reference video ,The old version 直接从从视频中提取面部特征(可以选择声音同步),同时生成参考视频的pkl模型 旧版
<img src="https://github.com/smthemex/ComfyUI_EchoMimic/blob/main/example/video2video.gif" width="60%">mormal Audio-Drived Algo Inference The new version workflow 音频驱动视频常规示例 最新版
<img src="https://github.com/smthemex/ComfyUI_EchoMimic/blob/main/example/example_.png" width="60%">pose from pkl,The old version, 基于预生成的pkl模型生成视频. 旧版
<img src="https://github.com/smthemex/ComfyUI_EchoMimic/blob/main/example/new.png" width="60%">示例的 VH node : ComfyUI-VideoHelperSuite
特别的选项:
Infir_mode: Audio driven video generation, "audio-d rived" and "audio-d rived_acc";
Infer_rode: Refer to the PKL model file to generate "pose_normal" and "pose_acc" for the video pose;
Motion_Sync: If opened and there is a video file in videoFILE, generate a pkl file and generate a reference video for the video; The pkl file is located in the input \ sensorrt_lite directory. To use it again, you need to restart ComfyUI.
Motion_Sync: If turned off and pose_mode is not 'none', read the pkl file of the selected pose_mode directory name and generate a pose video; If pose_mode is empty, generate a video based on the default assets \ test_pose_demo_pose
Special options:
--Save_video: If you do not want to use VH nodes, it can be turned on and turned off by default;
--Draw_mause: You can try it out;
--Length: frame rate, duration equal to length/fps;
--The ACC model only requires 6 steps, but the quality has slightly decreased;
--Built in image proportional cropping.
Special attention should be paid to:
--The cfg value is set to 1, which is only valid in turbo mode, otherwise an error will be reported.
既往更新:
Previous updates:
EchoMimici
@misc{chen2024echomimic,
title={EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning},
author={Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma},
year={2024},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
EchoMimici-V2
@misc{meng2024echomimic,
title={EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation},
author={Rang Meng, Xingyu Zhang, Yuming Li, Chenguang Ma},
year={2024},
eprint={2411.10061},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
sapiens
@article{khirodkar2024sapiens,
title={Sapiens: Foundation for Human Vision Models},
author={Khirodkar, Rawal and Bagautdinov, Timur and Martinez, Julieta and Zhaoen, Su and James, Austin and Selednik, Peter and Anderson, Stuart and Saito, Shunsuke},
journal={arXiv preprint arXiv:2408.12569},
year={2024}
}