Using Spark-TTS in Comfyui. Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokenss
Using Spark-TTS in ComfyUI. Spark-TTS: An efficient text-to-speech model based on LLM with clone sounds from various languages.
[2025-03-21] ⚒️: Refactored code, optional model unloading, faster generation speed. Added more tunable parameters. Supports cross-lingual voice cloning.
[2025-03-07] ⚒️: Released version v1.0.0. New recording node MW Audio Recorder for Spark
can be used to record audio with a microphone, and the progress bar displays the recording progress:
cd ComfyUI/custom_nodes
git clone https://github.com/billwuhao/ComfyUI_SparkTTS.git
cd ComfyUI_SparkTTS
pip install -r requirements.txt
# python_embeded
./python_embeded/python.exe -m pip install -r requirements.txt
Download the following models to the ComfyUI\models\TTS
folder.
Move the Step-Audio-speakers
folder from this repository to the ComfyUI\models\TTS
folder.
The structure should look like this:
ComfyUI\models\TTS
├── Spark-TTS-0.5B
├── Step-Audio-speakers
Note: If you have already installed ComfyUI_StepAudioTTS, there’s no need to move it, as they share audio and configuration files.
You can then freely customize speakers under the ComfyUI\models\TTS\Step-Audio-speakers
folder for use. Ensure that the speaker name configuration matches exactly: