ComfyUI Extension: ComfyUI-FreeVC_wrapper

Authored by ShmuelRonen

Created

Updated

50 stars

A voice conversion extension node for ComfyUI based on a/FreeVC, enabling high-quality voice conversion capabilities within the ComfyUI framework.

Custom Nodes (0)

    README

    ComfyUI-FreeVC_wrapper

    A voice conversion extension node for ComfyUI based on FreeVC, enabling high-quality voice conversion capabilities within the ComfyUI framework.

    image

    Features

    • Support for multiple FreeVC models:
      • FreeVC: Standard 16kHz model
      • FreeVC-s: Source-filtering based model
      • FreeVC (24kHz): High-quality 24kHz model
    • Stereo and mono audio support
    • Automatic audio resampling
    • Integrated with ComfyUI's audio processing pipeline
    • GPU acceleration support (CUDA)

    Installation

    1. Install the extension in your ComfyUI's custom_nodes directory:
    cd ComfyUI/custom_nodes
    git clone https://github.com/ShmuelRonen/ComfyUI-FreeVC_wrapper.git
    cd ComfyUI-FreeVC_wrapper
    
    1. Install required Python packages:
    pip install librosa transformers numpy torch
    
    1. Download required checkpoints:

    a. Voice Conversion Models: Download the following checkpoint files from HuggingFace and place them in the custom_nodes/ComfyUI-FreeVC_wrapper/freevc/checkpoints directory:

    | Model | Filename | Description | |-------|----------|-------------| | FreeVC | freevc.pth | Standard 16kHz model | | FreeVC-s | freevc-s.pth | Source-filtering based model | | FreeVC (24kHz) | freevc-24.pth | High-quality 24kHz model |

    Direct download links:

    b. Speaker Encoder: Download the speaker encoder checkpoint from HuggingFace and place it in the custom_nodes/ComfyUI-FreeVC_wrapper/freevc/speaker_encoder/ckpt directory:

    | Component | Filename | Required For | |-----------|----------|--------------| | Speaker Encoder | pretrained_bak_5805000.pt | FreeVC and FreeVC (24kHz) models |

    Direct download link:

    Usage

    1. In ComfyUI, locate the "FreeVC Voice Conversion" node under the "audio/voice conversion" category
    2. Connect your inputs:
      • Source audio: The audio you want to convert
      • Reference audio: The target voice style
      • Select model type: Choose between FreeVC, FreeVC-s, or FreeVC (24kHz)
    3. Connect the output to your desired audio output node

    Model Selection Guide

    • FreeVC: Best for general purpose voice conversion at 16kHz
    • FreeVC-s: Better preservation of source speech content, recommended for maintaining clarity
    • FreeVC (24kHz): Highest quality output with better audio fidelity

    Known Issues and Troubleshooting

    1. File Not Found Errors:

      • Ensure all checkpoint files are in the correct directory
      • Verify file names match exactly: freevc.pth, freevc-s.pth, freevc-24.pth
    2. CUDA Out of Memory:

      • Try processing shorter audio clips
      • Use CPU if GPU memory is insufficient
    3. Audio Format Issues:

      • The node automatically handles stereo to mono conversion
      • Supports resampling from any sample rate
      • Trim silence from audio files for better results

    Contributing

    Contributions are welcome! Please feel free to submit a Pull Request.

    License

    This project is licensed under the MIT License - see the LICENSE file for details.

    Acknowledgments

    Citation

    If you use this in your research, please cite:

    @article{wang2023freevc,
      title={FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion},
      author={Wang, Jiarui and Chen, Shilong and Wu, Yu and Zhang, Pan and Xie, Lei},
      journal={arXiv preprint arXiv:2210.15418},
      year={2023}
    }