A voice conversion extension node for ComfyUI based on a/FreeVC, enabling high-quality voice conversion capabilities within the ComfyUI framework.
If you find this project helpful, consider buying me a coffee:
A voice conversion extension node for ComfyUI based on FreeVC, enabling high-quality voice conversion capabilities within the ComfyUI framework.
cd ComfyUI/custom_nodes
git clone https://github.com/ShmuelRonen/ComfyUI-FreeVC_wrapper.git
cd ComfyUI-FreeVC_wrapper
pip install librosa transformers numpy torch noisereduce
a. Voice Conversion Models: All model checkpoint files (3 models) are available in a single Google Drive folder: Download All Model Checkpoints (Google Drive)
After downloading, extract the file and place the checkpoints folder in the freevc directory:
ComfyUI-FreeVC_wrapper/freevc/
b. Speaker Encoder:
Download the speaker encoder checkpoint from HuggingFace and place it in the custom_nodes/ComfyUI-FreeVC_wrapper/freevc/speaker_encoder/ckpt
directory:
| Component | Filename | Required For |
|-----------|----------|--------------|
| Speaker Encoder | pretrained_bak_5805000.pt
| FreeVC, FreeVC (24kHz), D-FreeVC, and D-FreeVC (24kHz) models |
Direct download link:
Your final directory structure should look like this:
ComfyUI-FreeVC_wrapper/
├── freevc/
├── checkpoints/
│ ├── freevc.pth # Standard 16kHz model
│ ├── freevc-s.pth # Source-filtering based model
│ ├── freevc-24.pth # High-quality 24kHz model
│
└── speaker_encoder/
└── ckpt/
└── pretrained_bak_5805000.pt # Speaker encoder checkpoint
In ComfyUI, locate the "FreeVC Voice Converter v2 🎤" node under the "audio/voice conversion" category
Connect your inputs:
Configure the conversion parameters:
Connect the output to your desired audio output node
File Not Found Errors:
CUDA Out of Memory:
Audio Quality Issues:
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this in your research, please cite:
@article{wang2023freevc,
title={FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion},
author={Wang, Jiarui and Chen, Shilong and Wu, Yu and Zhang, Pan and Xie, Lei},
journal={arXiv preprint arXiv:2210.15418},
year={2023}
}