A voice conversion extension node for ComfyUI based on a/FreeVC, enabling high-quality voice conversion capabilities within the ComfyUI framework.
A voice conversion extension node for ComfyUI based on FreeVC, enabling high-quality voice conversion capabilities within the ComfyUI framework.
cd ComfyUI/custom_nodes
git clone https://github.com/ShmuelRonen/ComfyUI-FreeVC_wrapper.git
cd ComfyUI-FreeVC_wrapper
pip install librosa transformers numpy torch
a. Voice Conversion Models:
Download the following checkpoint files from HuggingFace and place them in the custom_nodes/ComfyUI-FreeVC_wrapper/freevc/checkpoints
directory:
| Model | Filename | Description |
|-------|----------|-------------|
| FreeVC | freevc.pth
| Standard 16kHz model |
| FreeVC-s | freevc-s.pth
| Source-filtering based model |
| FreeVC (24kHz) | freevc-24.pth
| High-quality 24kHz model |
Direct download links:
b. Speaker Encoder:
Download the speaker encoder checkpoint from HuggingFace and place it in the custom_nodes/ComfyUI-FreeVC_wrapper/freevc/speaker_encoder/ckpt
directory:
| Component | Filename | Required For |
|-----------|----------|--------------|
| Speaker Encoder | pretrained_bak_5805000.pt
| FreeVC and FreeVC (24kHz) models |
Direct download link:
File Not Found Errors:
freevc.pth
, freevc-s.pth
, freevc-24.pth
CUDA Out of Memory:
Audio Format Issues:
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this in your research, please cite:
@article{wang2023freevc,
title={FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion},
author={Wang, Jiarui and Chen, Shilong and Wu, Yu and Zhang, Pan and Xie, Lei},
journal={arXiv preprint arXiv:2210.15418},
year={2023}
}