ComfyUI Extension: ComfyUI_IndexTTS

Authored by billwuhao

Created

Updated

38 stars

IndexTTS Voice Cloning Nodes for ComfyUI. High-quality voice cloning, very fast, supports Chinese and English, and allows custom voice styles.

Custom Nodes (0)

    README

    中文 | English

    IndexTTS Voice Cloning Node for ComfyUI

    Very high voice cloning quality, extremely fast, supports Chinese and English, and custom voice tones.

    📣 Updates

    [2025-05-30]⚒️: Released v1.2.0. Supports two-person dialogue, speaker preview, normal pynini installation on Windows, no longer a crippled TTS version!

    IndexTTS 正式发布1.5 版本了,效果666,晕XUAN4是一种GAN3觉,我爱你!,I love you!,“我爱你”的英语是“I love you”,2.5平方电线,共465篇,约315万字,2002年的第一场雪,下在了2003年.

    https://github.com/user-attachments/assets/b67891f2-0982-4540-8c3b-1a870305466f

    [2025-05-14]⚒️: Supports v1.5. Download and rename models to the ComfyUI\models\TTS\Index-TTS path:

    • https://huggingface.co/IndexTeam/IndexTTS-1.5/blob/main/bigvgan_generator.pth → bigvgan_generator_v1_5.pth
    • https://huggingface.co/IndexTeam/IndexTTS-1.5/blob/main/bpe.model → bpe_v1_5.model
    • https://huggingface.co/IndexTeam/IndexTTS-1.5/blob/main/gpt.pth → gpt_v1_5.pth

    [2025-05-02]⚒️: DeepSpeed acceleration available, requires DeepSpeed installation. For Windows, see DeepSpeed Installation. Acceleration is not significant.

    [2025-04-30]⚒️: Released v1.0.0.

    Usage

    Important parameter descriptions (other less important parameters will not be introduced one by one):

    • max_mel_tokens: Controls the length of the generated speech. This parameter needs to be increased for long texts.
    • max_text_tokens_per_sentence: Maximum number of tokens per sentence. Smaller values lead to faster inference speed, but consume more memory and might affect quality.
    • sentences_bucket_max_size: Maximum capacity for sentence bucketing. Larger values lead to faster inference speed, but consume more memory and might affect quality.
    • fast_inference: Enable fast inference.
    • custom_cuda_kernel: Enable custom CUDA kernel. The CUDA kernel extension will be built automatically on the first run.
    • dialogue_audio_s2: The second audio for two-person dialogue. If this audio is input, dialogue mode will be automatically enabled. In dialogue mode, the input text must be as follows ([S1] indicates the first speaker, [S2] indicates the second speaker):
    [S1] 轻喘像风掠过耳畔, 
    [S2] 你靠近时,连呼吸都慢了半拍。
    [S1] 指尖在我锁骨上游移, 
    [S2] 仿佛试探一扇未曾开启的门。
    
    • Loading Audio:

    image

    • Preview Speaker:

    I will unify all speaker audios for TTS nodes into the ComfyUI\models\TTS\speakers path. These nodes include IndexTTS, CSM, Dia, KokoroTTS, MegaTTS, QuteTTS, SparkTTS, StepAudioTTS, etc.

    image

    • Two-person Dialogue:

    image

    Installation

    • Windows: First, install the following dependencies:

    Download the pynini wheel for the corresponding Python version from pynini-windows-wheels.

    Example:

    D:\AIGC\python\py310\python.exe -m pip install pynini-2.1.6.post1-cp3xx-cp3xx-win_amd64.whl
    D:\AIGC\python\py310\python.exe -m pip install importlib_resources
    D:\AIGC\python\py310\python.exe -m pip install WeTextProcessing>=1.0.4 --no-deps
    
    • Linux, Mac, Windows:
    cd ComfyUI/custom_nodes
    git clone https://github.com/billwuhao/ComfyUI_IndexTTS.git
    cd ComfyUI_IndexTTS
    pip install -r requirements.txt
    
    # python_embeded
    ./python_embeded/python.exe -m pip install -r requirements.txt
    

    Model Download

    • Models need to be manually downloaded to the ComfyUI\models\TTS\Index-TTS path:

    The Index-TTS structure is as follows:

    bigvgan_generator.pth
    bpe.model
    gpt.pth
    

    Acknowledgements