ComfyUI Extension: ComfyUI F5-TTS
Text to speech with F5-TTS
Custom Nodes (0)
README
ComfyUI node to make text to speech audio with your own voice.
Using F5-TTS https://github.com/SWivid/F5-TTS
Instructions
-
Put in ComfyUI's
input
folder a .wav file of an audio of the voice you'd like to use, remove any background music, noise. -
And a .txt file of the same name with what was said.
-
Press refresh to see it in the node
-
input/F5-TTS
input/audio
folders will also work.
You can use the examples here...
- Examples voices
- Simple workflow
- Workflow with input audio only, using OpenAI's Whisper to get the text
- Workflow with all features
- Effortlessly Clone Your Own Voice by using ComfyUI and Almost in Real-Time! (Step-by-Step Tutorial & Workflow Included)
Other languages / custom models...
You can put the model & vocab txt files into models/checkpoints/F5-TTS
folder if you have any more models. Name the .txt vocab file and the .pt model file the same names. Press "refresh" and it should appear under the "model" selection.
Custom F5-TTS languages on huggingface
I haven't tried these... Finnish French German Greek Hindi Hungarian Italian Japanese Malaysian Norwegian Polish Portuguese BR Russian Spanish Turkish Thai Vietnamese Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu
Multi voices output...
Use the F5-TTS Audio
node(not the from input node).
Put your sample voice files into the input
folder like...
voice.wav
voice.txt
voice.deep.wav
voice.deep.txt
voice.chipmunk.wav
voice.chipmunk.txt
Then you can use prompts for different voices...
{main} Hello World this is the end
{deep} This is the narrator
{chipmunk} Please, I need more helium
Multi voice input...
Put a sentence of voice 1 and a sentence from voice 2 into the input audio sample. F5-TTS cuts the audio off at 15 seconds so don't make it too long. Example
BigVGAN models.
To use BigVGAN, you have to add a little dot to make it work with ComfyUI...
In the file custom_nodes/ComfyUI-F5-TTS/F5-TTS/src/third_party/BigVGAN/bigvgan.py
Add a little dot on the line at the top that says...
from utils import init_weights, get_padding
so it's becomes...
from .utils import init_weights, get_padding
Tips...
- F5-TTS cuts your voice sample off at 15 secs. It may cut off in the middle of a word and not cut the text only audio. Make sure your input samples are less than 15 secs.
- If you're using the ComfyUI-Whisper node you will also need to install ffmpeg
Install from git
It's best to install from ComfyUI-manager because it will update all your custom_nodes when you click "update all". With git, you will have to update manually.
Clone this repository into custom_nodes and run this to install from git
cd custom_nodes/ComfyUI-F5-TTS
git submodule update --init --recursive
pip install -r requirements.txt
Changes
1.0.22: Added TDHS(Time-domain harmonic scaling) to advanced node. 1.0.21: Added advanced node 1.0.19: Added model_type.