ComfyUI Extension: Pruna nodes for ComfyUI

Authored by PrunaAI

Created

Updated

57 stars

This repository explains how to accelerate image generation in ComfyUI using Pruna, an inference optimization engine that makes AI models faster, smaller, cheaper, and greener. ComfyUI is a popular node-based GUI for image generation models, for which we provide a custom compilation node that accelerates Stable Diffusion (SD) and Flux inference, while preserving output quality.

Custom Nodes (1)

README

Pruna nodes for ComfyUI

This repository explains how to accelerate image generation in ComfyUI using Pruna, an inference optimization engine that makes AI models faster, smaller, cheaper, and greener. ComfyUI is a popular node-based GUI for image generation models, for which we provide the following nodes:

  • a compilation node, that optimizes inference speed through model compilation. While this technique fully preserves output quality, performance gains can vary depending on the model.
  • Caching nodes that smartly reuse intermediate computations to accelerate inference with minimal quality degradation. In particular, we provide three caching nodes, each one using a different caching strategy:
    • Adaptive Caching: Dynamically adjusts caching for each prompt by identifying the optimal inference steps to reuse cached outputs.
    • Periodic Caching: Caches model outputs at fixed intervals, reusing them in subsequent steps to reduce computation.
    • Auto Caching: Automatically determines the optimal caching schedule to achieve a target latency reduction with minimal quality trade-off.
    By adjusting the hyperparameters of these nodes, you can achieve the best trade-off between speed and output quality for your specific use case.

Our nodes support both Stable Diffusion (SD) and Flux models.

In this repository, you'll find:

Installation

Prerequisites

Currently, running our nodes requires a Linux system with a GPU. To setup your environment, follow the steps below:

  1. Create a conda environment
  2. Install ComfyUI
  3. Install the latest version of Pruna or Pruna Pro:
  • To install Pruna:
    pip install pruna==0.2.2
    
  • To install Pruna Pro:
    pip install pruna_pro==0.2.2.post2
    

To use Pruna Pro, you also need to:

  1. Export your Pruna token as an environment variable:
export PRUNA_TOKEN=<your_token_here>
  1. [Optional] If you want to use the the x-fast compiler, you need to install additional dependencies:
pip install pruna[stable-fast]==0.2.2

Note: Pruna Pro is required to use the caching node or the x_fast compilation mode.

Steps

  1. Navigate to your ComfyUI installation's custom_nodes folder:
cd <path_to_comfyui>/custom_nodes
  1. Clone this repository:
git clone https://github.com/PrunaAI/ComfyUI_pruna.git
  1. Launch ComfyUI, for example, with:
cd <path_to_comfyui> && python main.py --disable-cuda-malloc --gpu-only

The Pruna nodes will appear in the nodes menu in the Pruna category.

Important note: To use compilation (either in the Pruna compile node or in the caching nodes), you need to launch ComfyUI with the --disable-cuda-malloc flag; otherwise the node may not function properly. For optimal performance, we also recommend setting the --gpu-only flag.

Usage

Workflows

We provide two types of workflows: one using a Stable Diffusion model and another based on Flux. To these models, we apply caching, compilation or their combination.

| Node | Stable Diffusion | Flux | |--------------------------|-----------------|------| | Compilation | SD Compilation (Preview) | Flux Compilation (Preview) | | Adaptive Caching | SD Adaptive Caching (Preview) | Flux Adaptive Caching (Preview) | | Periodic Caching | SD Periodic Caching (Preview) | Flux Periodic Caching (Preview) | | Auto Caching | SD Auto Caching (Preview) | Flux Auto Caching (Preview) |

To load the workflow:

  • Drag and drop the provided json file into the ComfyUI window
  • OR Click Open in the Workflow tab, as shown here, and select the file

To run the workflow, make sure that you have first set up the desired model.

Model Setup

Example 1: Stable Diffusion

You have two options for the base model:

Option 1: SafeTensors Format (Recommended)
  1. Download the safetensors version
  2. Place it in <path_to_comfyui>/models/checkpoints
Option 2: Diffusers Format
  1. Download the Diffusers version of SD v1.4
  2. Place it in <path_to_comfyui>/models/diffusers
  3. Replace the Load Checkpoint node with a DiffusersLoader node

The node is tested using the SafeTensors format, so for the sake of reproducibility, we recommend using that format. However, we don't expect any performance differences between the two.

After loading the model, you can choose the desired workflow, and you're all set!

Note: In this example, we use the Stable Diffusion v1.4 model. However, our nodes are compatible with any other SD model — feel free to use your favorite one!

Example 2: Flux

To use Flux, you must separately download all model components—including the VAE, CLIP, and diffusion model weights—and place them in the appropriate folder.

Steps to set up Flux:

  1. For the CLIP models: Get the following files:

    Move them to <path_to_comfyui>/models/clip/.

  2. For the VAE model: Get the VAE model, and move it to <path_to_comfyui>/models/vae/ directory.

  3. For the Flux model: You first need to request access to the model here. Once you have access, download the weights and move them to <path_to_comfyui>/models/diffusion_models/.

Now, just load the workflow and you're ready to go!

Hyperparameters

Through the GUI, you can configure various optimization settings for the compilation and caching nodes.

Compilation node

We currently support two compilation modes: x_fast and torch_compile, with x_fast set as the default.

Caching nodes

We offer three caching nodes, each implementing a different caching strategy. For more details on the underlying algorithms, see the Pruna documentation.

Below, is a summary of the available parameters for each caching node.

Common Parameters for All Caching Nodes:

| Parameter | Options | Description | |-----------|---------|-------------| | compiler | torch_compile, none | Compiler to apply on top of caching | | cache_mode | default, taylor | Caching mode (default reuses previous steps, taylor uses Taylor expansion for more accurate approximation) |

Node-Specific Parameters:

Adaptive Caching: | Parameter | Range | Default | Description | |-----------|--------|---------|-------------| | threshold | 0.001 - 0.2 | 0.01 | Difference threshold between current and previous latent before caching. Higher is faster but reduces quality | | max_skip_steps | 1 - 5 | 4 | Maximum consecutive steps that can be skipped. Higher is faster but reduces quality |

Periodic Caching: | Parameter | Range | Default | Description | |-----------|--------|---------|-------------| | cache_interval | 1 - 7 | 2 | How often to compute and cache the model output | | start_step | 0 - 10 | 2 | Number of steps to wait before starting to cache |

Auto Caching: | Parameter | Range | Default | Description | |-----------|--------|---------|-------------| | speed_factor | 0.0 - 1.0 | 0.5 | Controls inference latency. Lower values yield faster inference but may compromise quality |

Note: Caching and x_fast compilation require access to the Pruna Pro version.

Performance

The node was tested on an NVIDIA L40S GPU. Below, we compare the performance of the base model, with the models optimized with Pruna's compilation and caching nodes. We run two types of experiments: one using 50 denoising steps and another using 28 steps. We compare the iterations per second (as reported by ComfyUI) and the end-to-end time required to generate a single image.

50 steps

Performance Performance

28 steps

Performance Performance

Hyperparameters: For caching, we used the taylor mode and the torch_compile compiler, along with the default hyperparameters.

Note that for Stable Diffusion models, x_fast typically delivers better performance than torch_compile, whereas for Flux models, torch_compile tends to outperform x_fast.

Contact

For questions, feedback or community discussions, feel free to join our Discord.

For bug reports or technical issues, please open an issue in this repository.