ComfyUI Extension: ComfyUI-Grounding
Grounding for dummies, simplest workflow
Custom Nodes (0)
README
ComfyUI-Grounding
Grounding toolbox

šÆ 8 Nodes Total - 2 Loaders + 2 Detectors + 2 SAM2 + 2 Utilities
š 6 Model Families - GroundingDINO, MM-GroundingDINO, OWLv2, Florence-2, YOLO-World, SA2VA
š¤ 33 Models - 19 bbox detection + 6 mask generation + 8 SAM2 variants
š¾ Smart Caching - Instant reload
š¦ Batch Processing - Multiple images at once
š Built-in Masks - No separate BboxToMask node needed
Visual Demos
Model Switching
Switch between 19+ detection models with a single dropdown. One node for everything.

SA2VA Vision-Language Segmentation
When Florence2 isn't enough. Sa2va has VERY advanced semantic understanding and reasoning capabilities.

SAM2 Support

Batch Processing
Process multiple images simultaneously with all nodes supporting batch operations.

Label Splitting Logic
Control label separation: use periods for multiple labels, commas for single compound labels.

Installation
cd ComfyUI/custom_nodes/
git clone https://github.com/PozzettiAndrea/ComfyUI-Grounding
cd ComfyUI-Grounding
pip install -r requirements.txt
On first startup, example assets and workflows are auto-installed.
The Nodes
Detection Nodes
1. Grounding Model Loader - 19+ models in dropdown (see footnotes)
2. Grounding Detector - Universal detector, splits labels on "." only
Mask Generation Nodes
3. Grounding Mask Loader - Florence-2 (2) + SA2VA (4) models
4. Grounding Mask Detector - Direct masks from text, outputs masks + overlays + descriptions
SAM2 Segmentation Nodes
5. SAM2 Model Loader - SAM2/2.1 (8 variants), auto-downloads, fp16/bf16/fp32
6. SAM2 Segment - Segment from bboxes or points
Utility Nodes
7. Bounding Box Visualizer - Custom line width
8. Batch Crop and Pad - Uniform sizing for batches
Example Workflows
- normal_grounding.json - Detection + SAM2 segmentation
- batch_normal_grounding.json - Multi-image processing
- mask_grounding.json - Direct SA2VA masking
Advanced Features
Detection modes: single_box_mode (top result only) ⢠single_box_per_prompt_mode (best per label)
Output formats: list_only (SAM2-compatible) ⢠dict_with_data (with labels/scores)
Prompt format: Use periods for multiple labels "dog. cat." ⢠Use commas for single label "small, brown dog"
Credits
- GroundingDINO - IDEA-Research
- OWLv2 - Google Research
- Florence-2 - Microsoft Research
- YOLO-World - Ultralytics
License
MIT License
Footnotes
Full list of models:
<div style="font-size: 0.75em; line-height: 1.4;">- GroundingDINO: SwinT OGC (694MB) - IDEA-Research/grounding-dino-tiny
- GroundingDINO: SwinB (938MB) - IDEA-Research/grounding-dino-base
- MM-GroundingDINO: Tiny O365+GoldG (50.4 mAP) - openmmlab-community/mm_grounding_dino_tiny_o365v1_goldg
- MM-GroundingDINO: Tiny O365+GoldG+GRIT (50.5 mAP) - openmmlab-community/mm_grounding_dino_tiny_o365v1_goldg_grit
- MM-GroundingDINO: Tiny O365+GoldG+V3Det (50.6 mAP) - openmmlab-community/mm_grounding_dino_tiny_o365v1_goldg_v3det
- MM-GroundingDINO: Base O365+GoldG+V3Det (52.5 mAP) - openmmlab-community/mm_grounding_dino_base_o365v1_goldg_v3det
- MM-GroundingDINO: Base All Datasets (59.5 mAP) - openmmlab-community/mm_grounding_dino_base_all
- MM-GroundingDINO: Large O365v2+OIv6+GoldG (53.0 mAP) - openmmlab-community/mm_grounding_dino_large_o365v2_oiv6_goldg
- MM-GroundingDINO: Large All Datasets (60.3 mAP) - openmmlab-community/mm_grounding_dino_large_all
- OWLv2: Base Patch16 - google/owlv2-base-patch16
- OWLv2: Large Patch14 - google/owlv2-large-patch14
- OWLv2: Base Patch16 Ensemble - google/owlv2-base-patch16-ensemble
- OWLv2: Large Patch14 Ensemble - google/owlv2-large-patch14-ensemble
- Florence-2: Base (0.23B params) - microsoft/Florence-2-base
- Florence-2: Large (0.77B params) - microsoft/Florence-2-large
- YOLO-World: v8s (Small) - yolov8s-worldv2.pt
- YOLO-World: v8m (Medium) - yolov8m-worldv2.pt
- YOLO-World: v8l (Large) - yolov8l-worldv2.pt
- YOLO-World: v8x (Extra Large) - yolov8x-worldv2.pt