A ComfyUI node containing multiple audio processing tools.
Audio is the bridge connecting text, video, and images. Videos without audio or text are bland. This project currently includes the following main nodes:
Examples:
1, Add subtitles to video:
2, Combine ComfyUI_EraX-WoW-Turbo for automatic speech recognition, and then add subtitles to the video:
3, Combine ComfyUI_EraX-WoW-Turbo, ComfyUI_gemmax, ComfyUI_SparkTTS, ComfyUI-LatentSyncWrapper for automatic speech recognition, automatic translation, automatic voice cloning, automatic lip sync, automatic subtitle addition to video (detailed example workflow workflow-examples):
4, Arbitrary time scale cropping of audio:
5, Audio volume, speed, pitch, echo processing, etc.:
6, Remove silent parts from audio and recording:
7, Audio Watermark Embedding (Disable watermark embedding; if a watermark exists, it will be automatically detected):
ComfyUI\models\TTS\SilentCipher\44_1_khz\73999_iteration
directory.[2025-03-28]⚒️: Added watermark embedding node.
[2025-03-26]⚒️: Released version v1.0.0.
Install sox and add it to the system path.
cd ComfyUI/custom_nodes
git clone https://github.com/billwuhao/ComfyUI_AudioTools.git
cd ComfyUI_AudioTools
pip install -r requirements.txt
# python_embeded
./python_embeded/python.exe -m pip install -r requirements.txt