ComfyUI-SparkTTS is a custom ComfyUI node implementation of SparkTTS, an advanced text-to-speech system that harnesses the power of large language models (LLMs) to generate highly accurate and natural-sounding speech.
ComfyUI_SparkTTS is a custom ComfyUI node implementation of SparkTTS, an advanced text-to-speech system that harnesses the power of large language models (LLMs) to generate highly accurate and natural-sounding speech.
2025/03/21: Update ComfyUI-SparkTTS to v1.1.0 ( update.md )
ComfyUI-SparkTTS provides the following main functionalities:
Comfyui-SparkTTS
and installinstall requirment.txt in the ComfyUI-SparkTTS folder
./ComfyUI/python_embeded/python -m pip install -r requirements.txt
cd ComfyUI/custom_nodes
git clone https://github.com/1038lab/ComfyUI-SparkTTS
install requirment.txt in the ComfyUI-SparkTTS folder
./ComfyUI/python_embeded/python -m pip install -r requirements.txt
Ensure pip install comfy-cli
is installed.
Installing ComfyUI comfy install
(if you don't have ComfyUI Installed)
install the ComfyUI-SparkTTS, use the following command:
comfy node registry-install Comfyui-Spark-TTS
install requirment.txt in the ComfyUI-SparkTTS folder
./ComfyUI/python_embeded/python -m pip install -r requirements.txt
ComfyUI/models/TTS/SparkTTS/
when first time using the custom node./ComfyUI/models/TTSSparkTTS/SparkTTS-2.0
folder.This node allows you to create a customized voice by adjusting parameters.
Inputs:
text
: Text to synthesize.gender
: Gender of the voice (female or male).pitch
: Pitch level of the voice (very_low, low, moderate, high, very_high).speed
: Speed level of the voice (very_low, low, moderate, high, very_high).batch_texts
(optional): Additional texts for better control over pacing and intonation.Outputs:
audio
: Generated audio with the customized voice.This node allows you to clone a voice from a reference audio sample.
Inputs:
text
: Text to synthesize with the cloned voice.reference_audio
: The audio sample to clone the voice from.reference_text
: Transcript of the reference audio to improve cloning quality.max_tokens
: Controls the maximum length of generated speech.batch_texts
(optional): Additional texts for better control over pacing and intonation.Outputs:
audio
: Generated audio with the cloned voice.This node allows you to clone a voice from a reference audio with control over pitch and speed.
Inputs:
text
: Text to synthesize with the cloned voice.reference_audio
: The audio sample to clone the voice from.reference_text
: Transcript of the reference audio to improve cloning quality.pitch
: Pitch level of the voice.speed
: Speed level of the voice.max_tokens
: Controls the maximum length of generated speech.batch_texts
(optional): Additional texts for better control over pacing and intonation.Outputs:
audio
: Generated audio with the cloned voice.This node allows you to directly record audio.
Inputs:
recording
: Set to True to start recording audio.recording_duration
: Recording duration in seconds.sample_rate
: Audio sample rate.noise_threshold
: Noise reduction threshold.smoothing_kernel_size
: Size of the kernel used for smoothing the audio signal.Outputs:
audio
: Recorded audio data.Check the example_workflows
directory for example workflows.
SparkTTS currently supports the following languages:
GPL-3.0 License