ComfyUI Extension: comfyui_GLM_TTS
ComfyUI nodes for GLM-TTS, a high-quality text-to-speech system supporting zero-shot voice cloning.
Custom Nodes (0)
README
ComfyUI-GLM-TTS
ComfyUI nodes for GLM-TTS, a high-quality text-to-speech system supporting zero-shot voice cloning.
Installation
-
Clone this repository into your
ComfyUI/custom_nodes/directory.cd ComfyUI/custom_nodes git clone https://github.com/karas17/ComfyUI-GLM-TTS.git(If you downloaded this folder directly, just rename it to
ComfyUI-GLM-TTS) -
Install dependencies:
pip install -r requirements.txtNote:
torch,torchaudio, andtransformersare required but expected to be in your ComfyUI environment.onnxruntime(oronnxruntime-gpu) is required for the frontend. If you encounter issues, installonnxruntime-gpu.
-
Models: By default, models are loaded from
ComfyUI/models/GLM-TTS. If missing, they will be auto-downloaded from HuggingFacezai-org/GLM-TTSto that location. Windows 示例:<ComfyUI 安装目录>\models\GLM-TTSStructure expected inside the model path:
speech_tokenizer/vq32k-phoneme-tokenizer/llm/flow/frontend/(containscampplus.onnx,spk2info.pt)
You can also provide an absolute
model_pathto a prepared directory with the above structure.
Usage
- GLM-TTS Loader:
model_path: Path to your models. DefaultGLM-TTSresolves toComfyUI/models/GLM-TTS. Absolute paths are supported(Windows 示例:<ComfyUI 安装目录>\models\GLM-TTS).sample_rate: 24000 (default) or 32000.
- GLM-TTS Sampler:
model: Connect from Loader.text: Text to speak.reference_audio(Optional): Audio for voice cloning.reference_text(Optional): Transcript of the reference audio (improves cloning).
Notes
- The first run might be slow as it loads models.
- Ensure your
reference_audiois clear.