ComfyUI Extension: ComfyUI-audio-speed

Authored by ptmaster

Created 2 months ago

Updated about a month ago

12 stars

This node pack is designed to adjust audio playback speed within ComfyUI, particularly to sync audio with models like FantasyTalking (WAN) that require specific frame rates. It can also be used for general-purpose audio speed control.

Custom Nodes (0)

README

ComfyUI-audio-speed

6/24/2025 Because the sample rate of various audios in comfyui is not uniform, and direct merging will cause errors, I made a unified 48KHz frequency conversion node.

222

5/30/2025 When using FantasyTalking for voiceovers with WAN (通义万相 Video Model), I noticed it only supports 23FPS audio frame rates while I typically work in 16FPS environments. I conceived an approach: feeding the sampler with audio accelerated by 1.4x (23/16 ratio), then importing the original audio during final video rendering. However, since ComfyUI currently lacks native audio speed adjustment nodes, I developed a custom solution. This self-made node defaults to 0.7x speed (equivalent to 1.4x acceleration) for audio processing. Of course, you can also utilize it as a universal audio speed adjustment node for any desired speed modification.

https://github.com/user-attachments/assets/e138cc75-037e-4747-830b-6ede8df1769d

https://github.com/user-attachments/assets/19452e03-0222-4092-a259-8f6acfda9de3

QQ_1748187513433

This demonstrates a classic configuration scenario where processing 107 frames over 4 seconds achieves an output of 154 frames spanning 7 seconds.

The proposed frame_info output facilitates precise extraction of the audio file's frame count, which is algorithmically employed to dynamically configure the frame parameters of the video rendering model, thereby establishing an intuitive and computationally efficient workflow framework. To validate intermediate results, users may invoke modular nodes labeled Preview Arbitrary or Display Anything to iteratively visualize frame alignment metadata, enabling real-time verification of audiovisual coherence and parameter optimization.