ComfyUI Extension: multiGPU Upscaler
multiGPU batch-parallel upscaling nodes for ComfyUI.
Custom Nodes (0)
README
ComfyUI-multiGPU-upscaler
Multi-GPU batch-parallel upscaling nodes for ComfyUI.
Table of Contents
Features
This extension is designed to:
- Use 1–10 GPUs efficiently.
- By default, auto-detect and use up to 2 GPUs.
- Split batched images across GPUs and upscale them in parallel.
- Use robust tiled upscaling with OOM-safe fallback.
- Work great with RealESRGAN / ESRGAN style models (e.g.
RealESRGAN_4xplus).
Tested with:
- Dual RTX 3060 setup
- SDXL generation + 4x RealESRGAN upscaling, batch 4–8
- Achieved measurable speedups vs single-GPU upscaling.
Requirements
- NVIDIA GPUs only. This extension relies on CUDA for device management and communication.
Performance
Here are some sample benchmarks comparing a standard Upscale Image (using Model) node against the multiGPU_upscaler node.
- Test Setup: Dual RTX 3060
- Workflow: SDXL Generation + 4x RealESRGAN Upscaling
- Resolution: 1024x1024 upscaled to 4K (4096x4096)
- Batch Size: 8 (split as 4 images per GPU in the multi-GPU test)
| Run Type | Standard Upscaler (1 GPU) | multiGPU Upscaler (2 GPUs) | Speedup | | :--- | :--- | :--- | :--- | | Cold Run (gen0) | 261.81s | 233.53s | ~10.8% | | Run 1 (gen1) | 259.57s | 223.22s | ~14.0% | | Run 2 (gen2) | 251.65s | 226.82s | ~9.9% |
Results show a measurable speedup, especially on repetitive runs, by parallelizing the upscale task across both GPUs.
Installation
-
Go to your ComfyUI
custom_nodesdirectory.Example:
cd ComfyUI/custom_nodes -
Clone this repository:
git clone [https://github.com/alludus/ComfyUI-multiGPU-upscaler.git](https://github.com/alludus/ComfyUI-multiGPU-upscaler.git)Or download the ZIP from GitHub and extract it to:
ComfyUI/custom_nodes/ComfyUI-multiGPU-upscaler/ -
Ensure the structure looks like this:
ComfyUI/custom_nodes/ComfyUI-multiGPU-upscaler/__init__.py ComfyUI/custom_nodes/ComfyUI-multiGPU-upscaler/multiGPU_upscaler.py -
Restart ComfyUI.
All nodes will appear under the category:
multiGPU_upscaler
Nodes
multiGPU_upscaler: Multi-GPU Batch Parallel
Main node. Splits the batch across multiple GPUs and upscales in parallel.
Best for:
-
Batch size ≥ 4
-
2 or more GPUs
-
Post-generation upscaling (e.g. SDXL → 8 images → 4x RealESRGAN upscale)
-
Inputs:
upscale_model: Note: Load this using a standardLoad Upscale Modelnode.image: Batched input tensor from ComfyUI.device_list:- How to select GPUs.
auto(default): Uses up toauto_max_devicesGPUs with the most free VRAM.- Custom list (Examples:
cuda:0,cuda:1or0,1,2). - Up to 10 GPUs supported.
auto_max_devices:- Default:
2 - Used only when
device_list = "auto". - Limits the number of GPUs auto mode uses.
- Default:
primary_share:- Default:
0.5 - Approximate fraction of the batch assigned to the first (best) GPU.
- If one GPU is stronger or has more free VRAM, increase (e.g.
0.7).
- Default:
tile_size:- Default:
512 - Starting tile size on all GPUs. Automatically reduced on OOM.
- Default:
min_tile_size:- Default:
128 - Smallest allowed tile size before failing.
- Default:
overlap:- Default:
32 - Tile overlap in pixels.
- Default:
-
Behavior:
- Determines GPUs:
- If
device_listis set: Uses exactly that set (filtered by availability). - If
device_list = "auto": Uses up toauto_max_devicesGPUs with the most free VRAM.
- If
- Splits Batch:
- First GPU receives about
primary_shareof the images. - Remaining GPUs share the rest.
- First GPU receives about
- Executes:
- Spawns a worker thread for each GPU.
- Each worker instantiates its own copy of the model on that GPU.
- Each worker runs tiled upscaling on its subset with OOM-safe tiling.
- Finishes:
- Outputs are concatenated in the original batch order.
- If any worker errors or OOMs, it falls back to a single-GPU tiled upscale on the best available GPU.
- Determines GPUs:
Recommended Settings
For a setup like:
- 2x RTX 3060
- SDXL generation
- RealESRGAN_4xplus 4x upscaling
- Batch size 4–8
Recommended Node: multiGPU_upscaler: Multi-GPU Batch Parallel
- Settings:
device_list:autoauto_max_devices:2primary_share:0.5tile_size:512min_tile_size:128overlap:32
This configuration lets the extension pick the two best GPUs, splits work evenly, and uses robust tiling.
Tips & Debugging
- If you encounter OOM:
- Lower
tile_size(e.g. to256). - Optionally increase
min_tile_sizeto reduce retries.
- Lower
- If one GPU is stronger:
- Increase
primary_share(e.g.0.6–0.8) so it does more work.
- Increase
- Debugging:
- Watch the ComfyUI console/log for
[multiGPU]messages. - Use
nvidia-smito confirm multiple GPUs are active during upscaling.
- Watch the ComfyUI console/log for
License
This project is released under the Apache 2.0 License.
See the LICENCE file for details.