ComfyUI-TTS is a tool that allows you to convert strings within ComfyUI to audio so you can hear what's written. My objective with this one was to be able to use it with LLM AI models, but I wanted to leave the door open for way more other uses.
Where This Fits
TTS is "text to speech", which converts the written word to sound you can hear. It does not
do the other thing, converting audio to text.
Piper-tts was the first TTS program I chose to implement because it's meant to be easy to do so. The feature set is less complete, but it works simple and easy.
ONNX models are used by Piper-tts, along with a JSON file which should be named the same as the onnx, but with a .json extension. I noticed some of the downloadables are not this way, and it's up to you to fix that (sorry)
ComfyUI-Manager lets us use Stable Diffusion using a flow graph layout.
Why I Made This
I wanted to integrate text generation and image generation AI in one interface and see what other people can come up with to use them. TTS is just one aspect of being able to use text generation.
Features:
Currently let's you load ONNX models in a consistent fashion with other ComfyUI models and can use them to generate audio output from text.
Upcoming Features:
Intend to expand the Piper-tts function options
Then going to start working on implementing basic XTTSv2
This is a very recent release. Only basic functionality is probable.
Conclusion
We appreciate your interest in TTS for ComfyUI. Feel free to explore and provide feedback or report any issues you encounter. Your contributions and suggestions are valuable to the project.