This repository explains how to accelerate image generation in ComfyUI using Pruna, an inference optimization engine that makes AI models faster, smaller, cheaper, and greener. ComfyUI is a popular node-based GUI for image generation models, for which we provide a custom compilation node that accelerates Stable Diffusion (SD) and Flux inference, while preserving output quality.
This repository explains how to accelerate image generation in ComfyUI using Pruna, an inference optimization engine that makes AI models faster, smaller, cheaper, and greener. ComfyUI is a popular node-based GUI for image generation models, for which we provide the following nodes:
Our nodes support both Stable Diffusion (SD) and Flux models.
In this repository, you'll find:
Currently, running our nodes requires a Linux system with a GPU. To setup your environment, follow the steps below:
pip install pruna==0.2.2
pip install pruna_pro==0.2.2.post2
To use Pruna Pro, you also need to:
export PRUNA_TOKEN=<your_token_here>
x-fast
compiler, you need to install additional dependencies:pip install pruna[stable-fast]==0.2.2
Note: Pruna Pro is required to use the caching node or the
x_fast
compilation mode.
custom_nodes
folder:cd <path_to_comfyui>/custom_nodes
git clone https://github.com/PrunaAI/ComfyUI_pruna.git
cd <path_to_comfyui> && python main.py --disable-cuda-malloc --gpu-only
The Pruna nodes will appear in the nodes menu in the Pruna
category.
Important note: To use compilation (either in the Pruna compile
node or in the caching nodes), you need to launch ComfyUI with the --disable-cuda-malloc
flag;
otherwise the node may not function properly. For optimal performance, we also recommend setting the --gpu-only
flag.
We provide two types of workflows: one using a Stable Diffusion model and another based on Flux. To these models, we apply caching, compilation or their combination.
| Node | Stable Diffusion | Flux | |--------------------------|-----------------|------| | Compilation | SD Compilation (Preview) | Flux Compilation (Preview) | | Adaptive Caching | SD Adaptive Caching (Preview) | Flux Adaptive Caching (Preview) | | Periodic Caching | SD Periodic Caching (Preview) | Flux Periodic Caching (Preview) | | Auto Caching | SD Auto Caching (Preview) | Flux Auto Caching (Preview) |
To load the workflow:
Open
in the Workflow
tab, as shown here, and select the fileTo run the workflow, make sure that you have first set up the desired model.
You have two options for the base model:
<path_to_comfyui>/models/checkpoints
<path_to_comfyui>/models/diffusers
Load Checkpoint
node with a DiffusersLoader
nodeThe node is tested using the SafeTensors format, so for the sake of reproducibility, we recommend using that format. However, we don't expect any performance differences between the two.
After loading the model, you can choose the desired workflow, and you're all set!
Note: In this example, we use the Stable Diffusion v1.4 model. However, our nodes are compatible with any other SD model — feel free to use your favorite one!
To use Flux, you must separately download all model components—including the VAE, CLIP, and diffusion model weights—and place them in the appropriate folder.
Steps to set up Flux:
For the CLIP models: Get the following files:
Move them to <path_to_comfyui>/models/clip/
.
For the VAE model:
Get the VAE model, and move it to <path_to_comfyui>/models/vae/
directory.
For the Flux model:
You first need to request access to the model here. Once you have access, download the weights and move them to <path_to_comfyui>/models/diffusion_models/
.
Now, just load the workflow and you're ready to go!
Through the GUI, you can configure various optimization settings for the compilation and caching nodes.
We currently support two compilation modes: x_fast
and torch_compile
, with x_fast
set as the default.
We offer three caching nodes, each implementing a different caching strategy. For more details on the underlying algorithms, see the Pruna documentation.
Below, is a summary of the available parameters for each caching node.
Common Parameters for All Caching Nodes:
| Parameter | Options | Description |
|-----------|---------|-------------|
| compiler
| torch_compile
, none
| Compiler to apply on top of caching |
| cache_mode
| default
, taylor
| Caching mode (default
reuses previous steps, taylor
uses Taylor expansion for more accurate approximation) |
Node-Specific Parameters:
Adaptive Caching:
| Parameter | Range | Default | Description |
|-----------|--------|---------|-------------|
| threshold
| 0.001 - 0.2 | 0.01 | Difference threshold between current and previous latent before caching. Higher is faster but reduces quality |
| max_skip_steps
| 1 - 5 | 4 | Maximum consecutive steps that can be skipped. Higher is faster but reduces quality |
Periodic Caching:
| Parameter | Range | Default | Description |
|-----------|--------|---------|-------------|
| cache_interval
| 1 - 7 | 2 | How often to compute and cache the model output |
| start_step
| 0 - 10 | 2 | Number of steps to wait before starting to cache |
Auto Caching:
| Parameter | Range | Default | Description |
|-----------|--------|---------|-------------|
| speed_factor
| 0.0 - 1.0 | 0.5 | Controls inference latency. Lower values yield faster inference but may compromise quality |
Note: Caching and
x_fast
compilation require access to the Pruna Pro version.
The node was tested on an NVIDIA L40S GPU. Below, we compare the performance of the base model, with the models
optimized with Pruna's compilation and caching nodes. We run two types of experiments: one using 50 denoising steps and another
using 28 steps. We compare the iterations per second (as reported by ComfyUI
) and the end-to-end time required to generate a single image.
Hyperparameters: For caching, we used the taylor
mode and the torch_compile
compiler, along with the default hyperparameters.
Note that for Stable Diffusion models, x_fast
typically delivers better performance than torch_compile
, whereas for Flux models, torch_compile
tends to outperform x_fast
.
For questions, feedback or community discussions, feel free to join our Discord.
For bug reports or technical issues, please open an issue in this repository.