ComfyUI Extension: MW-ComfyUI_MegaTTS3
Lightweight and Efficient, 🎧Ultra High-Quality Voice Cloning, Chinese and English.
Custom Nodes (0)
README
MegaTTS3 Voice Cloning Nodes for ComfyUI
High-quality voice cloning, supporting both Chinese and English, with cross-lingual cloning capabilities. Supports custom voice cloning!!! Extra-long text!!! Two-person dialogue!!! Full pynini installation on Windows, no more stripped-down TTS!!!.
📣 Updates
[2025-06-07]⚒️: v2.0.0. Supports custom voice cloning, extra-long text, two-person dialogue, and full pynini installation on Windows, no more stripped-down TTS!.
[S1] MegaTTS 真开源版本来了,效果666
[S2] 晕 xuan4 是一种 gan3 觉
[S1] 我爱你!I love you!“我爱你”的英语是“I love you”
[S2] 2.5平方电线,共465篇,约315万字
[S1] 2002年的第一场雪,下在了2003年
https://github.com/user-attachments/assets/b734e6bd-9303-4311-b3a4-618241ca6535
[2025-04-28]⚒️: Added a voice preview node. Preview the voice first, then clone if you're satisfied. Thanks to @chenpipi0807 for the idea😍. You can create categorized subfolders within the speakers
folder.
[2025-04-06]⚒️: Released v1.0.0.
Usage
- Single-person cloning (separate long text with blank lines):
- Two-person dialogue:
Installation
- For Windows, install the following dependencies first:
pynini-windows-wheels Download the pynini wheel file corresponding to your Python version.
Example:
D:\AIGC\python\py310\python.exe -m pip install pynini-2.1.6.post1-cp3xx-cp3xx-win_amd64.whl
D:\AIGC\python\py310\python.exe -m pip install importlib_resources
D:\AIGC\python\py310\python.exe -m pip install WeTextProcessing>=1.0.4 --no-deps
- Then, proceed with the normal installation:
cd ComfyUI/custom_nodes
git clone https://github.com/billwuhao/ComfyUI_MegaTTS3.git
cd ComfyUI_MegaTTS3
pip install -r requirements.txt
# For python_embeded
./python_embeded/python.exe -m pip install -r requirements.txt
Model Download
- Models and voices need to be downloaded manually and placed in the
ComfyUI\models\TTS
directory:
MegaTTS3 Download the entire folder and place it in the TTS
directory.
- For the VAE encoder model, which enables custom voice cloning without
.npy
files, please follow our WeChat Official Account to obtain it. Place it in theTTS\MegaTTS3\wavvae
folder:
-
Please place the audio in the
TTS\speakers
directory. I will unify all speaker audios for TTS nodes into theComfyUI\models\TTS\speakers
path. These nodes includeIndexTTS, CSM, Dia, KokoroTTS, MegaTTS, QuteTTS, SparkTTS, StepAudioTTS
, etc.
The structure is as follows:
.
│ .gitattributes
│ config.json
│ README.md
│
├─aligner_lm
│ config.yaml
│ model_only_last.ckpt
│
├─diffusion_transformer
│ config.yaml
│ model_only_last.ckpt
│
├─duration_lm
│ config.yaml
│ model_only_last.ckpt
│
├─g2p
│ added_tokens.json
│ config.json
│ generation_config.json
│ latest
│ merges.txt
│ model.safetensors
│ special_tokens_map.json
│ tokenizer.json
│ tokenizer_config.json
│ trainer_state.json
│ vocab.json
│
└─wavvae
config.yaml
decoder.ckpt
model_only_last.ckpt
Credits
Donation
Your appreciation is my greatest motivation! Thank you for supporting me with a cup of coffee!
<img src="https://github.com/billwuhao/ComfyUI_MegaTTS3/blob/main/images/20250607012102.jpg" alt="" width="200" height="200">