ComfyUI Extension: MW-ComfyUI_MegaTTS3

Authored by billwuhao

Created

Updated

193 stars

Lightweight and Efficient, 🎧Ultra High-Quality Voice Cloning, Chinese and English.

README

中文 | English

MegaTTS3 Voice Cloning Nodes for ComfyUI

High-quality voice cloning, supporting both Chinese and English, with cross-lingual cloning capabilities. Supports custom voice cloning!!! Extra-long text!!! Two-person dialogue!!! Full pynini installation on Windows, no more stripped-down TTS!!!.

📣 Updates

[2025-06-07]⚒️: v2.0.0. Supports custom voice cloning, extra-long text, two-person dialogue, and full pynini installation on Windows, no more stripped-down TTS!.

[S1] MegaTTS 真开源版本来了,效果666
[S2] 晕 xuan4 是一种 gan3 觉
[S1] 我爱你!I love you!“我爱你”的英语是“I love you”
[S2] 2.5平方电线,共465篇,约315万字
[S1] 2002年的第一场雪,下在了2003年

https://github.com/user-attachments/assets/b734e6bd-9303-4311-b3a4-618241ca6535

[2025-04-28]⚒️: Added a voice preview node. Preview the voice first, then clone if you're satisfied. Thanks to @chenpipi0807 for the idea😍. You can create categorized subfolders within the speakers folder.

[2025-04-06]⚒️: Released v1.0.0.

Usage

  • Single-person cloning (separate long text with blank lines):

image

  • Two-person dialogue:

image

Installation

  • For Windows, install the following dependencies first:

pynini-windows-wheels Download the pynini wheel file corresponding to your Python version.

Example:

D:\AIGC\python\py310\python.exe -m pip install pynini-2.1.6.post1-cp3xx-cp3xx-win_amd64.whl
D:\AIGC\python\py310\python.exe -m pip install importlib_resources
D:\AIGC\python\py310\python.exe -m pip install WeTextProcessing>=1.0.4 --no-deps
  • Then, proceed with the normal installation:
cd ComfyUI/custom_nodes
git clone https://github.com/billwuhao/ComfyUI_MegaTTS3.git
cd ComfyUI_MegaTTS3
pip install -r requirements.txt

# For python_embeded
./python_embeded/python.exe -m pip install -r requirements.txt

Model Download

  • Models and voices need to be downloaded manually and placed in the ComfyUI\models\TTS directory:

MegaTTS3 Download the entire folder and place it in the TTS directory.

  • For the VAE encoder model, which enables custom voice cloning without .npy files, please follow our WeChat Official Account to obtain it. Place it in the TTS\MegaTTS3\wavvae folder:
<img src="https://github.com/billwuhao/ComfyUI_MegaTTS3/blob/main/images/gzh.webp" alt="" width="200" height="200">
  • Google Cloud Drive

  • Please place the audio in the TTS\speakers directory. I will unify all speaker audios for TTS nodes into the ComfyUI\models\TTS\speakers path. These nodes include IndexTTS, CSM, Dia, KokoroTTS, MegaTTS, QuteTTS, SparkTTS, StepAudioTTS, etc.

The structure is as follows:

.
│  .gitattributes
│  config.json
│  README.md
│
├─aligner_lm
│      config.yaml
│      model_only_last.ckpt
│
├─diffusion_transformer
│      config.yaml
│      model_only_last.ckpt
│
├─duration_lm
│      config.yaml
│      model_only_last.ckpt
│
├─g2p
│      added_tokens.json
│      config.json
│      generation_config.json
│      latest
│      merges.txt
│      model.safetensors
│      special_tokens_map.json
│      tokenizer.json
│      tokenizer_config.json
│      trainer_state.json
│      vocab.json
│
└─wavvae
        config.yaml
        decoder.ckpt
        model_only_last.ckpt

Credits

Donation

Your appreciation is my greatest motivation! Thank you for supporting me with a cup of coffee!

<img src="https://github.com/billwuhao/ComfyUI_MegaTTS3/blob/main/images/20250607012102.jpg" alt="" width="200" height="200">