ComfyUI Extension: ComfyUI-MultiGPU

Authored by pollockjj

Created

Updated

17 stars

This extension adds CUDA device selection to supported loader nodes in ComfyUI. By monkey-patching ComfyUI’s memory management, each model component (like UNet, Clip, or VAE) can be loaded on a specific GPU. Examples included are multi-GPU workflows for SDXL, FLUX, LTXVideo, and Hunyuan Video for both standard and GGUF loader nodes.

Custom Nodes (0)

    README

    ComfyUI-MultiGPU

    Experimental nodes for using multiple GPUs as well as offloading model components to the CPU in a single ComfyUI workflow

    This extension adds device selection capabilities to model loading nodes in ComfyUI. It monkey patches the memory management of ComfyUI in a hacky way and is neither a comprehensive solution is nor is it well-tested on any edge-case CUDA/CPU solutions. Use at your own risk.

    Note: This does not add parallelism. The workflow steps are still executed sequentially just with model components loaded on different GPUs or offloaded to the CPU where allowed. Any potential speedup comes from not having to constantly load and unload models from VRAM.

    Installation

    Installation via ComfyUI-Manager is preferred. Simply search for ComfyUI-MultiGPU in the list of nodes and follow installation instructions.

    Manual Installation

    Clone this repository inside ComfyUI/custom_nodes/.

    Nodes

    The extension automatically creates MultiGPU versions of loader nodes. Each MultiGPU node has the same functionality as its original counterpart but adds a device parameter that allows you to specify the GPU to use.

    Currently supported nodes (automatically detected if available):

    • Standard ComfyUI model loaders:
      • CheckpointLoaderSimpleMultiGPU
      • CLIPLoaderMultiGPU
      • ControlNetLoaderMultiGPU
      • DualCLIPLoaderMultiGPU
      • TripleCLIPLoaderMultiGPU
      • UNETLoaderMultiGPU
      • VAELoaderMultiGPU
    • GGUF loaders (requires ComfyUI-GGUF):
      • UnetLoaderGGUFMultiGPU (supports quantized models like flux1-dev-gguf)
      • UnetLoaderGGUFAdvancedMultiGPU
      • CLIPLoaderGGUFMultiGPU
      • DualCLIPLoaderGGUFMultiGPU
      • TripleCLIPLoaderGGUFMultiGPU
    • XLabAI FLUX ControlNet (requires x-flux-comfy):
      • LoadFluxControlNetMultiGPU
    • Florence2 (requires ComfyUI-Florence2):
      • Florence2ModelLoaderMultiGPU
      • DownloadAndLoadFlorence2ModelMultiGPU
    • LTX Video Custom Checkpoint Loader (requires ComfyUI-LTXVideo):
      • LTXVLoaderMultiGPU
    • NF4 Checkpoint Format Loader(requires ComfyUI_bitsandbytes_NF4):
      • CheckpointLoaderNF4MultiGPU

    All MultiGPU nodes available for your install can be found in the "multigpu" category in the node menu.

    Example workflows

    All workflows have been tested on a 2x 3090 setup.

    Split FLUX.1-dev across two GPUs

    • examples/flux1dev_2gpu.json This workflow loads a FLUX.1-dev model and splits its components across two GPUs. The UNet model is loaded on GPU 1 while the text encoders and VAE are loaded on GPU 0.

    Split FLUX.1-dev between the CPU and a single GPU

    • examples/flux1dev_cpu_1gpu_GGUF.json This workflow demonstrates splitting a quantized, GGUF FLUX.1-dev model between a CPU and a single GPU. The UNet model is loaded on the GPU, while the VAE and text encoders are handled by the CPU. Requires ComfyUI-GGUF.

    Using GGUF quantized models across GPUs

    Using GGUF quantized models across a CPU and a single GPU for video generation

    • examples/hunyuan_cpu_1gpu_GGUF.json This workflow demonstrates using quantized GGUF models for Hunyan Video split across the CPU and one GPU. In this instance, a quantized video model's UNet and VAE are on GPU 0, whereas a split of one standard and one GGUF model text encoder are on the CPU. Requires ComfyUI-GGUF.

    Using GGUF quantized models across GPUs for video generation

    • examples/hunyuan_2gpu_GGUF.json This workflow demonstrates using quantized GGUF models for Hunyan Video split across multiple GPUs. In this instance, a quantized video model's UNet is on GPU 0 whereas the VAE and text encoders are on GPU 1. Requires ComfyUI-GGUF.

    Loading two SDXL checkpoints on different GPUs

    • examples/sdxl_2gpu.json This workflow loads two SDXL checkpoints on two different GPUs. The first checkpoint is loaded on GPU 0, and the second checkpoint is loaded on GPU 1.

    FLUX.1-dev and SDXL in the same workflow

    • examples/flux1dev_sdxl_2gpu.json This workflow loads a FLUX.1-dev model and an SDXL model in the same workflow. The FLUX.1-dev model has its UNet on GPU 1 with VAE and text encoders on GPU 0, while the SDXL model uses separate allocations on GPU 0.

    Image to Prompt to Image to Video Generation Pipeline

    1. Loading the Florence2 model on the CPU and providing a starting image for analysis and generating a text response
    2. Loading FLUX.1 Dev UNET on GPU 1, with CLIP and VAE on the CPU and generating an image using the Florence2 text as a prompt
    3. Loading the LTX Video UNet and VAE on GPU 2, and LTX-encoded CLIP on the CPU, and taking the resulting FLUX.1 image and provide it as the starting image for an LTX Video image-to-video generation
    4. Generate a 5 second video based on the provided image All models are distributed across available the available CPU and GPUs with no model reloading on dual 3090s. Requires ComfyUI-GGUF and ComfyUI-LTXVideo

    LLM-Guided Video Generation

    1. Using a local LLM (loaded on first GPU via llama.cpp) to take a text suggestion and craft an LTX Video promot
    2. Feeding the enhanced prompt to LTXVideo (loaded on second GPU) for video generation Requires appropriate LLM. Requires ComfyUI-GGUF.

    Support

    If you encounter problems, please open an issue. Attach the workflow if possible.

    Credits

    Originally created by Alexander Dzhoganov. Implementation improved by City96. Currently maintained by pollockjj.