ComfyUI Extension: ComfyUI-QwenVL
ComfyUI-QwenVL custom node: Integrates the Qwen-VL series, including Qwen2.5-VL and the latest Qwen3-VL, to enable advanced multimodal AI for text generation, image understanding, and video analysis.
Custom Nodes (0)
README
QwenVL for ComfyUI
The ComfyUI-QwenVL custom node integrates the powerful Qwen-VL series of vision-language models (LVLMs) from Alibaba Cloud, including the latest Qwen3-VL and Qwen2.5-VL, plus GGUF backends and text-only Qwen3 support. This advanced node enables seamless multimodal AI capabilities within your ComfyUI workflows, allowing for efficient text generation, image understanding, and video analysis.
π° News & Updates
- 2025/12/22: v2.0.0 Added GGUF supported nodes and Prompt Enhancer nodes. [Update]
[!IMPORTANT]
Install llama-cpp-python before running GGUF nodes instruction
- 2025/11/10: v1.1.0 Runtime overhaul with attention-mode selector, flash-attn auto detection, smarter caching, and quantization/torch.compile controls in both nodes. [Update]
- 2025/10/31: v1.0.4 Custom Models Supported [Update]
- 2025/10/22: v1.0.3 Models list updated [Update]
- 2025/10/17: v1.0.0 Initial Release
- Support for Qwen3-VL and Qwen2.5-VL series models.
- Automatic model downloading from Hugging Face.
- On-the-fly quantization (4-bit, 8-bit, FP16).
- Preset and Custom Prompt system for flexible and easy use.
- Includes both a standard and an advanced node for users of all levels.
- Hardware-aware safeguards for FP8 model compatibility.
- Image and Video (frame sequence) input support.
- "Keep Model Loaded" option for improved performance on sequential runs.
- Seed parameter for reproducible generation.
β¨ Features
- Standard & Advanced Nodes: Includes a simple QwenVL node for quick use and a QwenVL (Advanced) node with fine-grained control over generation.
- Prompt Enhancers: Dedicated text-only prompt enhancers for both HF and GGUF backends.
- Preset & Custom Prompts: Choose from a list of convenient preset prompts or write your own for full control.
- Multi-Model Support: Easily switch between various official Qwen-VL models.
- Automatic Model Download: Models are downloaded automatically on first use.
- Smart Quantization: Balance VRAM and performance with 4-bit, 8-bit, and FP16 options.
- Hardware-Aware: Automatically detects GPU capabilities and prevents errors with incompatible models (e.g., FP8).
- Reproducible Generation: Use the seed parameter to get consistent outputs.
- Memory Management: "Keep Model Loaded" option to retain the model in VRAM for faster processing.
- Image & Video Support: Accepts both single images and video frame sequences as input.
- Robust Error Handling: Provides clear error messages for hardware or memory issues.
- Clean Console Output: Minimal and informative console logs during operation.
π Installation
-
Clone this repository to your ComfyUI/custom_nodes directory:
cd ComfyUI/custom\_nodes git clone https://github.com/1038lab/ComfyUI-QwenVL.git -
Install the required dependencies:
cd ComfyUI/custom\_nodes/ComfyUI-QwenVL pip install \-r requirements.txt -
Restart ComfyUI.
π§ Node Overview
Transformers (HF) Nodes
- QwenVL: Quick vision-language inference (image/video + preset/custom prompts).
- QwenVL (Advanced): Full control over sampling, device, and performance settings.
- QwenVL Prompt Enhancer: Text-only prompt enhancement (supports both Qwen3 text models and QwenVL models in text mode).
GGUF (llama.cpp) Nodes
- QwenVL (GGUF): GGUF vision-language inference.
- QwenVL (GGUF Advanced): Extended GGUF controls (context, GPU layers, etc.).
- QwenVL Prompt Enhancer (GGUF): GGUF text-only prompt enhancement.
π§© GGUF Nodes (llama.cpp backend)
This repo includes GGUF nodes powered by llama-cpp-python (separate from the Transformers-based nodes).
- Nodes:
QwenVL (GGUF),QwenVL (GGUF Advanced),QwenVL Prompt Enhancer (GGUF) - Model folder (default):
ComfyUI/models/llm/GGUF/(configurable viagguf_models.json) - Vision requirement: install a vision-capable
llama-cpp-pythonwheel that providesQwen3VLChatHandler/Qwen25VLChatHandler
See docs/LLAMA_CPP_PYTHON_VISION_INSTALL.md
ποΈ Config Files
- HF models:
hf_models.jsonhf_vl_models: vision-language models (used by QwenVL nodes).hf_text_models: text-only models (used by Prompt Enhancer).
- GGUF models:
gguf_models.json - System prompts:
AILab_System_Prompts.json(includes both VL prompts and prompt-enhancer styles).
π₯ Download Models
The models will be automatically downloaded on first use. If you prefer to download them manually, place them in the ComfyUI/models/LLM/Qwen-VL/ directory.
HF Vision Models (Qwen-VL)
| Model | Link | | :---- | :---- | | Qwen3-VL-2B-Instruct | Download | | Qwen3-VL-2B-Thinking | Download | | Qwen3-VL-2B-Instruct-FP8 | Download | | Qwen3-VL-2B-Thinking-FP8 | Download | | Qwen3-VL-4B-Instruct | Download | | Qwen3-VL-4B-Thinking | Download | | Qwen3-VL-4B-Instruct-FP8 | Download | | Qwen3-VL-4B-Thinking-FP8 | Download | | Qwen3-VL-8B-Instruct | Download | | Qwen3-VL-8B-Thinking | Download | | Qwen3-VL-8B-Instruct-FP8 | Download | | Qwen3-VL-8B-Thinking-FP8 | Download | | Qwen3-VL-32B-Instruct | Download | | Qwen3-VL-32B-Thinking | Download | | Qwen3-VL-32B-Instruct-FP8 | Download | | Qwen3-VL-32B-Thinking-FP8 | Download | | Qwen2.5-VL-3B-Instruct | Download | | Qwen2.5-VL-7B-Instruct | Download |
HF Text Models (Qwen3)
| Model | Link | | :---- | :---- | | Qwen3-0.6B | Download | | Qwen3-4B-Instruct-2507 | Download | | qwen3-4b-Z-Image-Engineer | Download |
GGUF Models (Manual Download)
| Group | Model | Repo | Alt Repo | Model Files | MMProj | | :-- | :-- | :-- | :-- | :-- | :-- | | Qwen text (GGUF) | Qwen3-4B-GGUF | Qwen/Qwen3-4B-GGUF | | Qwen3-4B-Q4_K_M.gguf, Qwen3-4B-Q5_0.gguf, Qwen3-4B-Q5_K_M.gguf, Qwen3-4B-Q6_K.gguf, Qwen3-4B-Q8_0.gguf | | | Qwen-VL (GGUF) | Qwen3-VL-4B-Instruct-GGUF | Qwen/Qwen3-VL-4B-Instruct-GGUF | | Qwen3VL-4B-Instruct-F16.gguf, Qwen3VL-4B-Instruct-Q4_K_M.gguf, Qwen3VL-4B-Instruct-Q8_0.gguf | mmproj-Qwen3VL-4B-Instruct-F16.gguf | | Qwen-VL (GGUF) | Qwen3-VL-8B-Instruct-GGUF | Qwen/Qwen3-VL-8B-Instruct-GGUF | | Qwen3VL-8B-Instruct-F16.gguf, Qwen3VL-8B-Instruct-Q4_K_M.gguf, Qwen3VL-8B-Instruct-Q8_0.gguf | mmproj-Qwen3VL-8B-Instruct-F16.gguf | | Qwen-VL (GGUF) | Qwen3-VL-4B-Thinking-GGUF | Qwen/Qwen3-VL-4B-Thinking-GGUF | | Qwen3VL-4B-Thinking-F16.gguf, Qwen3VL-4B-Thinking-Q4_K_M.gguf, Qwen3VL-4B-Thinking-Q8_0.gguf | mmproj-Qwen3VL-4B-Thinking-F16.gguf | | Qwen-VL (GGUF) | Qwen3-VL-8B-Thinking-GGUF | Qwen/Qwen3-VL-8B-Thinking-GGUF | | Qwen3VL-8B-Thinking-F16.gguf, Qwen3VL-8B-Thinking-Q4_K_M.gguf, Qwen3VL-8B-Thinking-Q8_0.gguf | mmproj-Qwen3VL-8B-Thinking-F16.gguf |
π Usage
Basic Usage
- Add the "QwenVL" node from the π§ͺAILab/QwenVL category.
- Select the model_name you wish to use.
- Connect an image or video (image sequence) source to the node.
- Write your prompt using the preset or custom field.
- Run the workflow.
Advanced Usage
For more control, use the "QwenVL (Advanced)" node. This gives you access to detailed generation parameters like temperature, top_p, beam search, and device selection.
βοΈ Parameters
| Parameter | Description | Default | Range | Node(s) | | :---- | :---- | :---- | :---- | :---- | | model_name | The Qwen-VL model to use. | Qwen3-VL-4B-Instruct | - | Standard & Advanced | | quantization | On-the-fly quantization. Ignored for pre-quantized models (e.g., FP8). | 8-bit (Balanced) | 4-bit, 8-bit, None | Standard & Advanced | | preset_prompt | A selection of pre-defined prompts for common tasks. | "Describe this..." | Any text | Standard & Advanced | | custom_prompt | Overrides the preset prompt if provided. | | Any text | Standard & Advanced | | max_tokens | Maximum number of new tokens to generate. | 1024 | 64-2048 | Standard & Advanced | | keep_model_loaded | Keep the model in VRAM for faster subsequent runs. | True | True/False | Standard & Advanced | | seed | A seed for reproducible results. | 1 | 1 - 2^64-1 | Standard & Advanced | | temperature | Controls randomness. Higher values = more creative. (Used when num_beams is 1). | 0.6 | 0.1-1.0 | Advanced Only | | top_p | Nucleus sampling threshold. (Used when num_beams is 1). | 0.9 | 0.0-1.0 | Advanced Only | | num_beams | Number of beams for beam search. > 1 disables temperature/top_p sampling. | 1 | 1-10 | Advanced Only | | repetition_penalty | Discourages repeating tokens. | 1.2 | 0.0-2.0 | Advanced Only | | frame_count | Number of frames to sample from the video input. | 16 | 1-64 | Advanced Only | | device | Override automatic device selection. | auto | auto, cuda, cpu | Advanced Only |
π‘ Quantization Options
| Mode | Precision | Memory Usage | Speed | Quality | Recommended For | | :---- | :---- | :---- | :---- | :---- | :---- | | None (FP16) | 16-bit Float | High | Fastest | Best | High VRAM GPUs (16GB+) | | 8-bit (Balanced) | 8-bit Integer | Medium | Fast | Very Good | Balanced performance (8GB+) | | 4-bit (VRAM-friendly) | 4-bit Integer | Low | Slower* | Good | Low VRAM GPUs (<8GB) |
* Note on 4-bit Speed: 4-bit quantization significantly reduces VRAM usage but may result in slower performance on some systems due to the computational overhead of real-time dequantization.
π€ Setting Tips
| Setting | Recommendation | | :---- | :---- | | Model Choice | For most users, Qwen3-VL-4B-Instruct is a great starting point. If you have a 40-series GPU, try the -FP8 version for better performance. | | Memory Mode | Keep keep_model_loaded enabled (True) for the best performance if you plan to run the node multiple times. Disable it only if you are running out of VRAM for other nodes. | | Quantization | Start with the default 8-bit. If you have plenty of VRAM (>16GB), switch to None (FP16) for the best speed and quality. If you are low on VRAM, use 4-bit. | | Performance | The first time a model is loaded with a specific quantization, it may be slow. Subsequent runs (with keep_model_loaded enabled) will be much faster. |
π§ About Model
This node utilizes the Qwen-VL series of models, developed by the Qwen Team at Alibaba Cloud. These are powerful, open-source large vision-language models (LVLMs) designed to understand and process both visual and textual information, making them ideal for tasks like detailed image and video description.
πΊοΈ Roadmap
β Completed (v1.0.0)
- β Support for Qwen3-VL and Qwen2.5-VL models.
- β Automatic model downloading and management.
- β On-the-fly 4-bit, 8-bit, and FP16 quantization.
- β Hardware compatibility checks for FP8 models.
- β Image and Video (frame sequence) input support.
π Credits
- Qwen Team: Alibaba Cloud - For developing and open-sourcing the powerful Qwen-VL models.
- ComfyUI: comfyanonymous - For the incredible and extensible ComfyUI platform.
- llama-cpp-python: JamePeng/llama-cpp-python - GGUF backend with vision support used by the GGUF nodes.
- ComfyUI Integration: 1038lab - Developer of this custom node.
π License
This repository's code is released under the GPL-3.0 License.
