ComfyUI Extension: ComfyUI-Lightning

Authored by shenduldh

Created 7 months ago

Updated 5 months ago

208 stars

Accelerate FLUX inferencing speed for ComfyUI.

Custom Nodes (14)

README

⚡ComfyUI-Lightning

Introduction

This repository integrates all the tricks I know to speed up Flux inference:

Use TeaCache or FBCache or MBCache or ToCa;
Skip some unnessasery blocks;
Compile and quantize model;
Use fast CuDNN attention kernels;
Use SageAttention or SpargeAttn;
Fix AttributeError: 'SymInt' object has no attribute 'size' to speed up recompilation after resolution changing.

MBCache extends FBCache and is applied to cache multiple blocks. The codes are modified from SageAttention, ComfyUI-TeaCache, comfyui-flux-accelerator and Comfy-WaveSpeed. More details see above given repositories.

Updates

[2025/3/10] Add SpargeAttn. For more details, see Usage.
[2025/2/27] Add ToCa.
[2025/1/24] Now support Sana. Get your 1024*1024 images within 2s. All the codes are modified from Sana.

Usage

For Flux

You can use XXCache, SageAttention, and torch.compile with the following examples:

More specific:

Download Flux diffusion model and VAE image decoder from FLUX.1-dev or FLUX.1-schnell. Put the flux1-dev.safetensors or flux1-schnell.safetensors file into models/diffusion_models and the ae.safetensors file into models/vae;
Download Flux text encoder from flux_text_encoders and put all the .safetensors files into models/clip;
Run the example workflow.

For Sana

Download Sana diffusion model from Model Zoo and put the .pth file into models/diffusion_models;
Download Gemma text encoder from google/gemma-2-2b-it, unsloth/gemma-2b-it-bnb-4bit or Efficient-Large-Model/gemma-2-2b-it and put the whole folder into models/text_encoders;
Download DCAE image decoder from mit-han-lab/dc-ae-f32c32-sana-1.0 and put the .safetensors file into models/vae;
Run the example workflow.

For SpargeAttn

SpargeAttn is an attention acceleration method based on SageAttention, which requires hyperparameter tuning before using. The tuning process is shown in the following steps:

First you should follow the steps below to install SpargeAttn. If you have problems installing it, see the original repository;
```
git clone https://github.com/thu-ml/SpargeAttn.git
cd ./SpargeAttn
pip install -e .
```
If you do not have a hyperparameter file, you should perform a few rounds of quality fine-tuning to get one first. You just need to open the enable_tuning_mode of the node Apply SpargeAttn and perform the generation. For example, generate 50-step 512*512 images at 10 different prompts (very time-consuming);
<img src="./assets/spargeattn_autotune.png" alt="SpargeAttn Autotune" width="35%"/>
- The skip_DoubleStreamBlocks and skip_SingleStreamBlocks arguments are used to skip certain blocks that do not require the use of SpargeAttn, mainly to work with TeaCache and FBCache.
- Enable parallel_tuning to utilize multiple GPUs to accelerate tuning. In this case, you need to start ComfyUI with the argument --disable-cuda-malloc.
- [New] Follow the author's code updates to liberalize the use of the l1 and pv_l1 parameters for tuning.
Turn off enable_tuning_mode and use the Save Finetuned SpargeAttn Hyperparams node to save your hyperparameter file;
<img src="./assets/spargeattn_saving.png" alt="SpargeAttn Saving" width="90%"/>
Remove or disable the Save Finetuned SpargeAttn Hyperparams node and place the saved hyperparameter file in the models/checkpoints folder. Load this hyperparameter file in the Apply SpargeAttn node;
<img src="./assets/spargeattn_loading.png" alt="SpargeAttn Loading" width="90%"/>
Enjoy yourself.

To make tuning hyperparameters easier, I've provided an example workflow here. This workflow defaults to generating a 50-step 512*512 image for each of the 10 preset prompts (which can be modified as you see fit). Click on the Queue button to start tuning. Of course, you need to make sure you have the right environment before you start. Again, this process is very time consuming.

If you have a well-tuned hyperparameter file, feel free to share it.