ComfyUI Extension: Qwen2-VL wrapper for ComfyUI
ComfyUI Qwen2-VL wrapper that supports text-based and single-image queries.
Custom Nodes (66)
- AIO Aux Preprocessor
- AnimalPose Estimator (AP10K)
- Anime Face Segmentor
- Anime Lineart
- AnyLine Lineart
- BAE Normal Map
- Binary Lines
- Canny Edge
- Color Pallete
- ControlNetAuxSimpleAddText
- Preprocessor Selector
- DensePose Estimator
- Depth Anything
- Depth Anything V2 - Relative
- Diffusion Edge (batch size ↑ => speed ↑, VRAM ↑)
- DSINE Normal Map
- DWPose Estimator
- Execute All ControlNet Preprocessors
- Colorize Facial Parts from PoseKPS
- Fake Scribble Lines (aka scribble_hed)
- HED Soft-Edge Lines
- Enchance And Resize Hint Images
- Generation Resolution From Image
- Generation Resolution From Latent
- Image Intensity
- Image Luminance
- Inpaint Preprocessor
- LeReS Depth Map (enable boost for leres++)
- Realistic Lineart
- Standard Lineart
- Manga Lineart (aka lineart_anime_denoise)
- Mask Optical Flow (DragNUWA)
- MediaPipe Face Mesh
- MeshGraphormer Hand Refiner
- MeshGraphormer Hand Refiner With External Detector
- Metric3D Depth Map
- Metric3D Normal Map
- MiDaS Depth Map
- MiDaS Normal Map
- M-LSD Lines
- OneFormer ADE20K Segmentor
- OneFormer COCO Segmentor
- OpenPose Pose
- PiDiNet Soft-Edge Lines
- Pixel Perfect Resolution
- PyraCanny
- Qwen2.5
- Qwen2.5VL
- Render Pose JSON (Animal)
- Render Pose JSON (Human)
- SAM Segmentor
- Save Pose Keypoints
- Scribble PiDiNet Lines
- Scribble Lines
- Scribble XDoG Lines
- Semantic Segmentor (legacy, alias for UniFormer)
- Content Shuffle
- TEEDPreprocessor
- Tile
- TTPlanet Tile GuidedFilter
- TTPlanet Tile Simple
- UniFormer Segmentor
- Unimatch Optical Flow
- Upper Body Tracking From PoseKps (InstanceDiffusion)
- Zoe Depth Anything
- Zoe Depth Map
README
ComfyUI Qwen VL Nodes
This repository provides ComfyUI nodes that wrap the latest vision-language and language-only checkpoints from the Qwen family. Both Qwen3 VL and Qwen2.5 VL models are supported for multimodal reasoning, alongside text-only Qwen2.5 models for prompt generation.
What's New
- Added support for the Qwen3 VL family (
Qwen3-VL-4B-Thinking,Qwen3-VL-8B-Thinking, etc.). - Retained compatibility with existing Qwen2.5 VL models.
- Text-only workflows continue to use the Qwen2.5 instruct checkpoints.
Sample Workflows
- Multimodal workflow example:
workflow/Qwen2VL.json - Text generation workflow example:
workflow/qwen25.json

Installation
You can install through ComfyUI Manager (search for Qwen-VL wrapper for ComfyUI) or manually:
-
Clone the repository:
git clone https://github.com/alexcong/ComfyUI_QwenVL.git -
Change into the project directory:
cd ComfyUI_QwenVL -
Install dependencies (ensure you are inside your ComfyUI virtual environment if you use one):
pip install -r requirements.txt
Supported Nodes
- Qwen2VL node – Multimodal generation with Qwen3 VL and Qwen2.5 VL checkpoints. Accepts images or videos as optional inputs alongside text prompts.
- Qwen2 node – Text-only generation backed by Qwen2.5 instruct models, with optional quantization for lower memory usage.
Both nodes expose parameters for temperature, maximum token count, quantization (none/4-bit/8-bit), and manual seeding. Set keep_model_loaded to True to cache models between runs.
Model Storage
Downloaded models are stored under ComfyUI/models/LLM/.