ComfyUI Extension: ComfyUI-TangoFlux

Authored by LucipherDev

Created

Updated

82 stars

ComfyUI Custom Nodes for 'TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching'. This generates high-quality 44.1kHz audio up to 30 seconds using just a text prompt.

Custom Nodes (0)

    README

    ComfyUI-TangoFlux

    ComfyUI Custom Nodes for "TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching". These nodes, adapted from the official implementations, generates high-quality 44.1kHz audio up to 30 seconds using just a text promptproduction.

    Installation

    1. Navigate to your ComfyUI's custom_nodes directory:
    cd ComfyUI/custom_nodes
    
    1. Clone this repository:
    git clone https://github.com/LucipherDev/ComfyUI-TangoFlux
    
    1. Install requirements:
    cd ComfyUI-TangoFlux
    python install.py
    

    Or Install via ComfyUI Manager

    Check out some demos from the official demo page

    Example Workflow

    example_workflow

    Usage

    Models can be downloaded using the install.py script

    models_folder_structure

    Manual Download:

    • Download TangoFlux from here into models/tangoflux
    • Download text encoders from here into models/text_encoders/google-flan-t5-large

    (Include Everything as shown in the screenshot above. Do Not Rename Anything)

    The nodes can be found in "TangoFlux" category as TangoFluxLoader, TangoFluxSampler, TangoFluxVAEDecodeAndPlay.

    The audio output of the TangoFluxVAEDecodeAndPlay can be used as audio input for theComfyUI-VideoHelperSuite VideoCombine node. (This will not sync audio to the video)

    teacache_options

    TeaCache can speedup TangoFlux 2x without much audio quality degradation, in a training-free manner.

    📈 Inference Latency Comparisons on a Single A800

    | TangoFlux | TeaCache (0.25) | TeaCache (0.4) | |:-------------------:|:----------------------------:|:--------------------:| | ~4.08 s | ~2.42 s | ~1.95 s |

    Citation

    @misc{hung2024tangofluxsuperfastfaithful,
          title={TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization}, 
          author={Chia-Yu Hung and Navonil Majumder and Zhifeng Kong and Ambuj Mehrish and Rafael Valle and Bryan Catanzaro and Soujanya Poria},
          year={2024},
          eprint={2412.21037},
          archivePrefix={arXiv},
          primaryClass={cs.SD},
          url={https://arxiv.org/abs/2412.21037}, 
    }
    
    @article{liu2024timestep,
      title={Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model},
      author={Liu, Feng and Zhang, Shiwei and Wang, Xiaofeng and Wei, Yujie and Qiu, Haonan and Zhao, Yuzhong and Zhang, Yingya and Ye, Qixiang and Wan, Fang},
      journal={arXiv preprint arXiv:2411.19108},
      year={2024}
    }