ComfyUI Extension: ComfyUI-QwenVL
ComfyUI-QwenVL custom node: Integrates the Qwen-VL series, including Qwen2.5-VL and the latest Qwen3-VL, to enable advanced multimodal AI for text generation, image understanding, and video analysis.
Custom Nodes (0)
README
QwenVL for ComfyUI
The ComfyUI-QwenVL custom node integrates the powerful Qwen-VL series of vision-language models (LVLMs) from Alibaba Cloud, including the latest Qwen3-VL and Qwen2.5-VL. This advanced node enables seamless multimodal AI capabilities within your ComfyUI workflows, allowing for efficient text generation, image understanding, and video analysis.
π° News & Updates
- 2025/11/11: v1.1.0 Major Performance Updates [Update]
- New
attention_modeoption (auto,flash_attention_2,sdpa) with automatic Flash-Attention v2 detection. - Added
use_torch_compile(Torch 2.1+) to accelerate inference on CUDA withtorch.compile('reduce-overhead'). - Added
deviceoverride allowing manual selection (auto,cuda,cpu,mps). - Smarter VRAM management with automatic quantization downgrade when memory is low.
- New
- 2025/10/31: v1.0.4 Custom Models Supported [Update]
- 2025/10/22: v1.0.3 Models list updated [Update]
- 2025/10/17: v1.0.0 Initial Release
- Support for Qwen3-VL and Qwen2.5-VL series models.
- Automatic model downloading from Hugging Face.
- On-the-fly quantization (4-bit, 8-bit, FP16).
- Preset and Custom Prompt system for flexible and easy use.
- Includes both a standard and an advanced node for users of all levels.
- Hardware-aware safeguards for FP8 model compatibility.
- Image and Video (frame sequence) input support.
- "Keep Model Loaded" option for improved performance on sequential runs.
- Seed parameter for reproducible generation.
β¨ Features
- Standard & Advanced Nodes: Includes a simple QwenVL node for quick use and a QwenVL (Advanced) node with fine-grained control over generation.
- Preset & Custom Prompts: Choose from a list of convenient preset prompts or write your own for full control.
- Multi-Model Support: Easily switch between various official Qwen-VL models.
- Automatic Model Download: Models are downloaded automatically on first use.
- Smart Quantization: Balance VRAM and performance with 4-bit, 8-bit, and FP16 options.
- Hardware-Aware: Automatically detects GPU capabilities and prevents errors with incompatible models (e.g., FP8).
- Reproducible Generation: Use the seed parameter to get consistent outputs.
- Memory Management: "Keep Model Loaded" option to retain the model in VRAM for faster processing.
- Image & Video Support: Accepts both single images and video frame sequences as input.
- Robust Error Handling: Provides clear error messages for hardware or memory issues.
- Clean Console Output: Minimal and informative console logs during operation.
Flash-Attention v2 Integration: Automatically enabled when available for faster attention layers.
Torch Compile Optimization: Optional JIT compilation via
use_torch_compilefor extra throughput. Advanced Device Handling: Auto-detects CUDA, Apple Silicon (MPS), or CPU; can be overridden manually. Dynamic Memory Enforcement: Automatically adjusts quantization level based on VRAM availability.
π Installation
-
Clone this repository to your ComfyUI/custom_nodes directory:
cd ComfyUI/custom\_nodes git clone https://github.com/1038lab/ComfyUI-QwenVL.git -
Install the required dependencies:
cd ComfyUI/custom_nodes/ComfyUI-QwenVL pip install -r requirements.txt -
Restart ComfyUI.
π₯ Download Models
The models will be automatically downloaded on first use. If you prefer to download them manually, place them in the ComfyUI/models/LLM/Qwen-VL/ directory.
| Model | Link | | :---- | :---- | | Qwen3-VL-2B-Instruct | Download | | Qwen3-VL-2B-Thinking | Download | | Qwen3-VL-2B-Instruct-FP8 | Download | | Qwen3-VL-2B-Thinking-FP8 | Download | | Qwen3-VL-4B-Instruct | Download | | Qwen3-VL-4B-Thinking | Download | | Qwen3-VL-4B-Instruct-FP8 | Download | | Qwen3-VL-4B-Thinking-FP8 | Download | | Qwen3-VL-8B-Instruct | Download | | Qwen3-VL-8B-Thinking | Download | | Qwen3-VL-8B-Instruct-FP8 | Download | | Qwen3-VL-8B-Thinking-FP8 | Download | | Qwen3-VL-32B-Instruct | Download | | Qwen3-VL-32B-Thinking | Download | | Qwen3-VL-32B-Instruct-FP8 | Download | | Qwen3-VL-32B-Thinking-FP8 | Download | | Qwen2.5-VL-3B-Instruct | Download | | Qwen2.5-VL-7B-Instruct | Download |
π Usage
Basic Usage
- Add the "QwenVL" node from the π§ͺAILab/QwenVL category.
- Select the model_name you wish to use.
- Connect an image or video (image sequence) source to the node.
- Write your prompt using the preset or custom field.
- Run the workflow.
Advanced Usage
For more control, use the "QwenVL (Advanced)" node. This gives you access to detailed generation parameters like temperature, top_p, beam search, and device selection.
βοΈ Parameters
| Parameter | Description | Default | Range | Node(s) |
| :---- | :---- | :---- | :---- | :---- |
| model_name | The Qwen-VL model to use. | Qwen3-VL-4B-Instruct | - | Standard & Advanced |
| quantization | On-the-fly quantization. Ignored for pre-quantized models (e.g., FP8). | 8-bit (Balanced) | 4-bit, 8-bit, None | Standard & Advanced |
| attention_mode | Attention backend. auto tries Flash-Attn v2 when available, falls back to SDPA. | auto | auto, flash_attention_2, sdpa | Standard & Advanced |
| use_torch_compile | Enable torch.compile('reduce-overhead') for extra CUDA throughput (Torch 2.1+).| Flase | - | Advanced Only |
| device | Override automatic device selection. | auto | auto, cuda, cpu | Advanced Only |
| preset_prompt | A selection of pre-defined prompts for common tasks. | "Describe this..." | Any text | Standard & Advanced |
| custom_prompt | Overrides the preset prompt if provided. | | Any text | Standard & Advanced |
| max_tokens | Maximum number of new tokens to generate. | 1024 | 64-2048 | Standard & Advanced |
| keep_model_loaded | Keep the model in VRAM for faster subsequent runs. | True | True/False | Standard & Advanced |
| seed | A seed for reproducible results. | 1 | 1 - 2^64-1 | Standard & Advanced |
| temperature | Controls randomness. Higher values = more creative. (Used when num_beams is 1). | 0.6 | 0.1-1.0 | Advanced Only |
| top_p | Nucleus sampling threshold. (Used when num_beams is 1). | 0.9 | 0.0-1.0 | Advanced Only |
| num_beams | Number of beams for beam search. > 1 disables temperature/top_p sampling. | 1 | 1-10 | Advanced Only |
| repetition_penalty | Discourages repeating tokens. | 1.2 | 0.0-2.0 | Advanced Only |
| frame_count | Number of frames to sample from the video input. | 16 | 1-64 | Advanced Only |
π‘ Quantization Options
| Mode | Precision | Memory Usage | Speed | Quality | Recommended For | | :---- | :---- | :---- | :---- | :---- | :---- | | None (FP16) | 16-bit Float | High | Fastest | Best | High VRAM GPUs (16GB+) | | 8-bit (Balanced) | 8-bit Integer | Medium | Fast | Very Good | Balanced performance (8GB+) | | 4-bit (VRAM-friendly) | 4-bit Integer | Low | Slower* | Good | Low VRAM GPUs (<8GB) |
* Note on 4-bit Speed: 4-bit quantization significantly reduces VRAM usage but may result in slower performance on some systems due to the computational overhead of real-time dequantization.
π€ Setting Tips
| Setting | Recommendation | | :---- | :---- | | Model Choice | For most users, Qwen3-VL-4B-Instruct is a great starting point. If you have a 40-series GPU, try the -FP8 version for better performance. | | Memory Mode | Keep keep_model_loaded enabled (True) for the best performance if you plan to run the node multiple times. Disable it only if you are running out of VRAM for other nodes. | | Quantization | Start with the default 8-bit. If you have plenty of VRAM (>16GB), switch to None (FP16) for the best speed and quality. If you are low on VRAM, use 4-bit. | | Performance | The first time a model is loaded with a specific quantization, it may be slow. Subsequent runs (with keep_model_loaded enabled) will be much faster. |
π§ About Model
This node utilizes the Qwen-VL series of models, developed by the Qwen Team at Alibaba Cloud. These are powerful, open-source large vision-language models (LVLMs) designed to understand and process both visual and textual information, making them ideal for tasks like detailed image and video description.
πΊοΈ Roadmap
β Completed (v1.0.0)
- β Support for Qwen3-VL and Qwen2.5-VL models.
- β Automatic model downloading and management.
- β On-the-fly 4-bit, 8-bit, and FP16 quantization.
- β Hardware compatibility checks for FP8 models.
- β Image and Video (frame sequence) input support.
π Future Plans
- GGUF format support for CPU and wider hardware compatibility.
- Integration of more vision-language models.
- Advanced parameter options for fine-tuning generation.
- Support for additional video processing features.
π Credits
- Qwen Team: Alibaba Cloud - For developing and open-sourcing the powerful Qwen-VL models.
- ComfyUI: comfyanonymous - For the incredible and extensible ComfyUI platform.
- ComfyUI Integration: 1038lab - Developer of this custom node.
π License
This repository's code is released under the GPL-3.0 License.