ComfyUI Extension: ComfyUI Batch BBox Detector
Process batches of images (e.g., video frames) with ultralytics bbox detection in ComfyUI. This custom node package extends the standard BboxDetectorCombined to handle batches efficiently, making it ideal for processing 300+ video frames.
Custom Nodes (0)
README
ComfyUI Batch BBox Detector
Process batches of images (e.g., video frames) with ultralytics bbox detection in ComfyUI. This custom node package extends the standard BboxDetectorCombined
to handle batches efficiently, making it ideal for processing 300+ video frames.
Features
- Batch Processing: Handle multiple images in a single operation
- Memory Efficient: Chunked processing prevents out-of-memory errors on large batches
- Three Implementations: Choose the best approach for your workflow
- Error Handling: Robust processing that doesn't crash on individual frame failures
- ComfyUI Integration: Seamless integration with existing ImpactPack nodes
Installation
Method 1: ComfyUI Manager (Recommended when available)
- Open ComfyUI Manager
- Search for "Batch BBox Detector"
- Click Install
Method 2: Manual Installation
-
Navigate to your ComfyUI custom_nodes directory:
cd ComfyUI/custom_nodes/
-
Clone or copy this repository:
git clone https://github.com/yourusername/comfyui_bbox_batch.git
Or manually create the directory structure:
mkdir comfyui_bbox_batch cd comfyui_bbox_batch
-
Copy the following files:
__init__.py
nodes/
directory (with all its files)README.md
-
Restart ComfyUI
Node Types
1. BBox Detector (Batch)
Best for: Small to medium batches (< 100 frames)
Standard batch processor that processes each frame sequentially and returns stacked results.
Inputs:
bbox_detector
(BBOX_DETECTOR): The detector modelimages
(IMAGE): Batch of images[B, H, W, C]
threshold
(FLOAT): Detection confidence threshold (0.0-1.0, default: 0.5)dilation
(INT): Mask expansion amount (-512 to 512, default: 4)return_type
(OPTIONAL): "mask_only" or "image_with_boxes"
Outputs:
images
(IMAGE): Processed images[B, H, W, C]
masks
(MASK): Detection masks[B, H, W]
2. BBox Detector (Batch ForEach)
Best for: List-based workflows, when memory efficiency is critical
Uses ComfyUI's INPUT_IS_LIST pattern for more memory-efficient processing in certain workflows.
Inputs: Same as standard batch processor
Outputs: Lists of images and masks
3. BBox Detector (Batch Chunked) ⭐ RECOMMENDED
Best for: Large batches (300+ frames), video processing
Processes batches in chunks with memory management between chunks. This is the recommended node for video frame processing.
Inputs:
- Same as standard batch processor, plus:
chunk_size
(INT): Number of frames per chunk (1-128, default: 32)
Outputs: Same as standard batch processor
Memory Management:
- Automatically clears CUDA cache between chunks
- Progress tracking for large batches
- Prevents OOM errors on long videos
Usage Examples
Basic Video Frame Processing
Workflow:
Video Loader → Video to Image Batch → BBox Detector (Batch Chunked) → Output
Recommended Settings for 300+ frames:
threshold
: 0.5 (adjust based on detection quality needs)dilation
: 4 (increase for larger mask coverage)chunk_size
: 16-32 (lower for less VRAM, higher for faster processing)
Workflow with Mask Processing
Workflow:
Video Loader → Video to Image Batch → BBox Detector (Batch Chunked) → MaskToImage → Save Image
↓
Save Mask
Integration with ImpactPack
Workflow:
Load Checkpoint → UltralyticsDetectorProvider → BBox Detector (Batch Chunked)
↓
Image Batch ──────────────────────────────────┘
Performance Tips
Memory Optimization
-
Adjust Chunk Size:
- 8GB VRAM: chunk_size = 8-16
- 12GB VRAM: chunk_size = 16-32
- 24GB+ VRAM: chunk_size = 32-64
-
Image Resolution:
- Lower resolution reduces memory usage
- Consider downscaling before detection if appropriate
-
Enable Mixed Precision:
- If your detector supports it, enable FP16/mixed precision
Speed Optimization
-
Use Appropriate Node:
- Small batches (< 50): Use standard
BboxDetectorBatch
- Large batches (> 100): Use
BboxDetectorBatchChunked
- Small batches (< 50): Use standard
-
Batch Processing:
- Larger chunk sizes = faster processing (if memory allows)
- Balance between speed and stability
-
GPU Utilization:
- Ensure CUDA is available and properly configured
- Monitor GPU usage with
nvidia-smi
Parameters Explained
threshold (Float, 0.0-1.0)
Controls detection confidence. Higher values = more confident detections but may miss objects.
- 0.3-0.4: Detect more objects, more false positives
- 0.5 (default): Balanced detection
- 0.6-0.8: Fewer detections, higher confidence
dilation (Int, -512 to 512)
Expands or contracts the detection mask.
- Negative values: Contract mask (useful for precise boundaries)
- 0: No change
- Positive values: Expand mask (useful for including surrounding context)
- 4 (default): Slight expansion for better coverage
chunk_size (Int, 1-128)
Number of frames to process before clearing memory cache.
- Lower (8-16): Better memory efficiency, slower processing
- Higher (32-64): Faster processing, more memory usage
- 32 (default): Good balance for most systems
Troubleshooting
Out of Memory Errors
Problem: RuntimeError: CUDA out of memory
Solutions:
- Reduce
chunk_size
to 8 or 16 - Lower input image resolution
- Close other GPU-intensive applications
- Use
BboxDetectorBatchChunked
instead of standard batch
Empty Masks
Problem: All masks are black/empty
Solutions:
- Lower
threshold
(try 0.3-0.4) - Verify detector model is loaded correctly
- Check if objects are present in images
- Ensure image format is correct (RGB, proper range)
Slow Processing
Problem: Processing takes too long
Solutions:
- Increase
chunk_size
(if memory allows) - Use
BboxDetectorBatch
for small batches - Reduce image resolution
- Verify GPU is being utilized (check with
nvidia-smi
)
Detection Quality Issues
Problem: Poor detection results
Solutions:
- Adjust
threshold
value - Try different bbox detector models
- Ensure images are properly preprocessed
- Check image quality and resolution
Technical Details
Tensor Formats
- IMAGE tensors:
[B, H, W, C]
- Batch, Height, Width, Channels - MASK tensors:
[B, H, W]
- Batch, Height, Width - All tensors use the same device as input images
Error Handling
- Individual frame failures are caught and logged
- Failed frames return empty masks and original images
- Processing continues even if some frames fail
- Error messages indicate which frame failed and why
Memory Management
- CUDA cache is cleared between chunks (chunked processor)
- Tensors are kept on the same device throughout processing
- Intermediate results are properly cleaned up
Compatibility
- ComfyUI: Latest version
- Dependencies:
- PyTorch
- NumPy
- Ultralytics (for bbox detection models)
- ImpactPack (for BBOX_DETECTOR type)
Development
Project Structure
comfyui_bbox_batch/
├── __init__.py # Package initialization with node mappings
├── nodes/
│ ├── __init__.py # Nodes package initialization
│ ├── bbox_batch_detector.py # Standard batch processor
│ ├── bbox_batch_detector_foreach.py # ForEach list-based processor
│ └── bbox_batch_detector_chunked.py # Chunked processor for large batches
└── README.md # This file
Node Architecture
Each node class follows ComfyUI's node pattern:
INPUT_TYPES
: Defines input parametersRETURN_TYPES
: Defines output typesFUNCTION
: Name of the processing methodCATEGORY
: Node category in UI
License
MIT License - See LICENSE file for details
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
Support
- Issues: Report bugs on GitHub Issues
- Discussions: Ask questions in GitHub Discussions
- ComfyUI Discord: Get help in the custom nodes channel
Changelog
Version 1.0.0 (Current)
- Initial release
- Three node implementations (Standard, ForEach, Chunked)
- Support for batches of 300+ frames
- Memory-efficient chunked processing
- Comprehensive error handling
Credits
- Inspired by
BboxDetectorCombined
from ImpactPack - Built for the ComfyUI community
- Ultralytics for bbox detection capabilities
Acknowledgments
Thanks to the ComfyUI and ImpactPack communities for their excellent work and support.