ComfyUI Extension: ComfyUI Qwen2.5-VL Object Detection Node
This repository provides a custom ComfyUI node for running object detection with the a/Qwen 2.5 VL model. The node downloads the selected model on demand, runs a detection prompt and outputs bounding boxes that can be used with segmentation nodes such as a/SAM2.
Custom Nodes (64)
- AIO Aux Preprocessor
- AnimalPose Estimator (AP10K)
- Anime Face Segmentor
- Anime Lineart
- AnyLine Lineart
- BAE Normal Map
- Binary Lines
- Canny Edge
- Color Pallete
- ControlNetAuxSimpleAddText
- Preprocessor Selector
- DensePose Estimator
- Depth Anything
- Depth Anything V2 - Relative
- Diffusion Edge (batch size ↑ => speed ↑, VRAM ↑)
- DSINE Normal Map
- DWPose Estimator
- Execute All ControlNet Preprocessors
- Colorize Facial Parts from PoseKPS
- Fake Scribble Lines (aka scribble_hed)
- HED Soft-Edge Lines
- Enchance And Resize Hint Images
- Generation Resolution From Image
- Generation Resolution From Latent
- Image Intensity
- Image Luminance
- Inpaint Preprocessor
- LeReS Depth Map (enable boost for leres++)
- Realistic Lineart
- Standard Lineart
- Manga Lineart (aka lineart_anime_denoise)
- Mask Optical Flow (DragNUWA)
- MediaPipe Face Mesh
- MeshGraphormer Hand Refiner
- MeshGraphormer Hand Refiner With External Detector
- Metric3D Depth Map
- Metric3D Normal Map
- MiDaS Depth Map
- MiDaS Normal Map
- M-LSD Lines
- OneFormer ADE20K Segmentor
- OneFormer COCO Segmentor
- OpenPose Pose
- PiDiNet Soft-Edge Lines
- Pixel Perfect Resolution
- PyraCanny
- Render Pose JSON (Animal)
- Render Pose JSON (Human)
- SAM Segmentor
- Save Pose Keypoints
- Scribble PiDiNet Lines
- Scribble Lines
- Scribble XDoG Lines
- Semantic Segmentor (legacy, alias for UniFormer)
- Content Shuffle
- TEEDPreprocessor
- Tile
- TTPlanet Tile GuidedFilter
- TTPlanet Tile Simple
- UniFormer Segmentor
- Unimatch Optical Flow
- Upper Body Tracking From PoseKps (InstanceDiffusion)
- Zoe Depth Anything
- Zoe Depth Map
README
ComfyUI Qwen2.5-VL Object Detection Node
This repository provides a custom ComfyUI node for running object detection with the Qwen 2.5 VL model. The node downloads the selected model on demand, runs a detection prompt and outputs bounding boxes that can be used with segmentation nodes such as SAM2.
Nodes
DownloadAndLoadQwenModel
Downloads a chosen Qwen 2.5-VL model into models/Qwen
and returns the loaded model and processor. You can choose which device to load the model onto (e.g. cuda:1
if you have multiple GPUs), the precision for the checkpoint (INT4, INT8, BF16, FP16 or FP32) and whether to use FlashAttention or SDPA. FlashAttention is automatically replaced with SDPA when FP32 precision is selected because FlashAttention does not support it.
QwenVLDetection
Runs a detection prompt on an input image using the loaded model. The node outputs a JSON list of bounding boxes of the form {"bbox_2d": [x1, y1, x2, y2], "label": "object"}
and a separate list of coordinates. Boxes are sorted by confidence and you can specify which ones to return using the bbox_selection parameter:
all
– return all boxes (default)- Comma-separated indices such as
0
,1,2
or0,2
– return only the selected boxes, sorted by detection confidence merge_boxes
– when enabled, merge the selected boxes into a single bounding boxscore_threshold
– drop boxes with a confidence score below this value when available
The bounding boxes are converted to absolute pixel coordinates so they can be passed to SAM2 nodes.
BBoxesToSAM2
Wraps a list of bounding boxes into the BBOXES
batch format expected by
ComfyUI-segment-anything-2
and compatible nodes such as
sam_2_ultra.py
.
Usage
- Place this repository inside your
ComfyUI/custom_nodes
directory. - From the Download and Load Qwen2.5-VL Model node, select the model you want to use, choose the desired precision (INT4/INT8/BF16/FP16/FP32), the attention implementation (FlashAttention or SDPA) and, if necessary, choose the device (such as
cuda:1
) where it should be loaded. The snapshot download will resume automatically if a previous attempt was interrupted. FlashAttention is replaced with SDPA automatically when used with FP32 precision. - Connect the output model to Qwen2.5-VL Object Detection, provide an image and the object you want to locate (e.g.
cat
). Optionally set score_threshold to filter out low-confidence boxes, use bbox_selection to choose specific ones (e.g.0,2
) and enable merge_boxes if you want them merged. The node will automatically build the detection prompt and return the selected boxes in JSON. - Pass the bounding boxes through Prepare BBoxes for SAM2 before feeding them into the SAM2 workflow.