ComfyUI Extension: ComfyUI-QwenVL-MultiImage
A powerful ComfyUI custom node that integrates Qwen2.5-VL and Qwen3-VL vision-language models with multi-image support. Process multiple images simultaneously with advanced AI capabilities for image understanding, comparison, and analysis.
Custom Nodes (0)
README
ComfyUI-QwenVL-MultiImage ๐งช
A powerful ComfyUI custom node that integrates Qwen2.5-VL and Qwen3-VL vision-language models with multi-image support. Process multiple images simultaneously with advanced AI capabilities for image understanding, comparison, and analysis.
โจ Features
- ๐ผ๏ธ Multi-Image Support: Process multiple images in a single inference
- ๐ค Latest Models: Support for Qwen2.5-VL and Qwen3-VL series
- ๐พ Flexible Quantization: 4-bit, 8-bit, and FP16 options for different VRAM requirements
- โก Model Caching: Keep models loaded in VRAM for faster subsequent runs
- ๐๏ธ Two Node Variants: Standard and Advanced nodes for different use cases
- ๐ง Full Parameter Control: Temperature, top_p, top_k, beam search, and more (Advanced node)
- ๐ฏ GPU Optimized: Flash Attention 2 support and efficient memory management
๐ฆ Installation
Method 1: ComfyUI Manager (Recommended)
- Open ComfyUI Manager
- Search for "QwenVL Multi-Image"
- Click Install
Method 2: Manual Installation
- Navigate to your ComfyUI custom nodes directory:
cd ComfyUI/custom_nodes/
- Clone this repository:
git clone https://github.com/YOUR_USERNAME/ComfyUI-QwenVL-MultiImage.git
- Install dependencies:
cd ComfyUI-QwenVL-MultiImage
pip install -r requirements.txt
- Restart ComfyUI
๐ฏ Supported Models
Qwen3-VL Series (Latest)
| Model | Size | HuggingFace Link | |-------|------|------------------| | Qwen3-VL-4B-Instruct | 4B | Download | | Qwen3-VL-8B-Instruct | 8B | Download | | Qwen3-VL-32B-Instruct | 32B | Download | | Qwen3-VL-8B-Thinking | 8B | Download | | Qwen3-VL-32B-Thinking | 32B | Download | | Qwen3-VL-8B-Instruct-FP8 | 8B | Download | | Qwen3-VL-32B-Instruct-FP8 | 32B | Download |
Qwen2.5-VL Series
| Model | Size | HuggingFace Link | |-------|------|------------------| | Qwen2.5-VL-2B-Instruct | 2B | Download | | Qwen2.5-VL-3B-Instruct | 3B | Download | | Qwen2.5-VL-7B-Instruct | 7B | Download | | Qwen2.5-VL-72B-Instruct | 72B | Download |
Models will be automatically downloaded from HuggingFace on first use.
๐ Usage
Basic Usage (Standard Node)
- Add the "๐งช QwenVL Multi-Image" node from the ๐งชAILab/QwenVLcategory
- Connect one or more image sources to the node:
- images: Main image input (can be a batch)
- images_batch_2: Optional second batch
- images_batch_3: Optional third batch
 
- Select your desired model from the dropdown
- Write your prompts:
- system_prompt: System instructions for the AI
- user_prompt: Your question or task
 
- Configure quantization and other settings
- Run the workflow
Advanced Usage (Advanced Node)
Use the "๐งช QwenVL Multi-Image (Advanced)" node for fine-grained control:
- temperature: Controls randomness (0.1-2.0)
- top_p: Nucleus sampling threshold (0.0-1.0)
- top_k: Top-k sampling (1-100)
- num_beams: Beam search width (1-10)
- repetition_penalty: Penalize repeated tokens (1.0-2.0)
- device: Force specific device (auto/cuda/cpu)
โ๏ธ Parameters
Common Parameters
| Parameter | Description | Default | Range | |-----------|-------------|---------|-------| | images | Main image input (supports batches) | Required | - | | model_name | Qwen-VL model to use | Qwen3-VL-4B-Instruct | See model list | | system_prompt | System instructions | "You are a helpful assistant." | Any text | | user_prompt | Your question/task | "Describe these images..." | Any text | | quantization | Memory optimization mode | 8-bit (Balanced) | FP16/8-bit/4-bit | | max_tokens | Maximum output length | 1024 | 64-4096 | | keep_model_loaded | Cache model in VRAM | True | True/False | | seed | Random seed | 1 | 1 - 2^32-1 |
Advanced Parameters (Advanced Node Only)
| Parameter | Description | Default | Range | |-----------|-------------|---------|-------| | temperature | Sampling randomness | 0.7 | 0.1-2.0 | | top_p | Nucleus sampling | 0.9 | 0.0-1.0 | | top_k | Top-k sampling | 50 | 1-100 | | repetition_penalty | Penalize repeats | 1.1 | 1.0-2.0 | | num_beams | Beam search width | 1 | 1-10 | | device | Device selection | auto | auto/cuda/cpu |
๐ก Quantization Guide
| Mode | Precision | VRAM Usage | Speed | Quality | Recommended For | |------|-----------|------------|-------|---------|-----------------| | None (FP16) | 16-bit | High | Fastest | Best | 16GB+ VRAM | | 8-bit (Balanced) | 8-bit | Medium | Fast | Very Good | 8GB+ VRAM | | 4-bit (VRAM-friendly) | 4-bit | Low | Slower | Good | <8GB VRAM |
Note: FP8 models (pre-quantized) automatically use optimized precision and ignore the quantization setting.
๐จ Example Use Cases
1. Multi-Image Comparison
Compare multiple product photos, analyze differences, or identify similarities.
Prompt: "Compare these images and describe the key differences between them."
2. Sequential Analysis
Analyze a sequence of images showing a process or timeline.
Prompt: "Describe the progression shown across these images."
3. Multi-View Understanding
Process multiple angles or views of the same object.
Prompt: "Based on these different views, provide a comprehensive description of the object."
4. Batch Description
Generate captions or descriptions for multiple images simultaneously.
Prompt: "Provide a detailed caption for each image."
๐ง Troubleshooting
Out of Memory Errors
- Switch to a smaller model (e.g., 2B or 3B)
- Enable more aggressive quantization (4-bit)
- Reduce max_tokens
- Process fewer images at once
- Set keep_model_loadedto False
Slow Performance
- Use FP8 models on supported hardware (RTX 40 series)
- Enable keep_model_loadedfor repeated inference
- Use FP16 quantization on high-VRAM systems
- Ensure CUDA is properly installed
Model Download Issues
Models are downloaded from HuggingFace on first use. If you encounter issues:
- Check your internet connection
- Verify HuggingFace is accessible
- Manually download models to your HuggingFace cache directory
- Check available disk space (models can be 2-150GB)
Import Errors
If you get import errors after installation:
cd ComfyUI/custom_nodes/ComfyUI-QwenVL-MultiImage
pip install -r requirements.txt --upgrade
๐ฏ Tips & Best Practices
Model Selection
- For most users: Start with Qwen3-VL-4B-Instruct(balanced performance)
- Low VRAM (<8GB): Use Qwen2.5-VL-2B-Instructwith 4-bit quantization
- Best quality: Use Qwen3-VL-32B-Instructwith FP16 (requires 24GB+ VRAM)
- RTX 40 series: Use FP8 variants for optimal speed
Memory Management
- Enable keep_model_loadedif running multiple inferences
- Disable it if you need to switch between different models
- Use 8-bit quantization as a good balance between quality and VRAM
Prompt Engineering
- Be specific about what you want from multiple images
- Use system prompts to set the context and behavior
- For comparisons, explicitly ask to compare or contrast
- Number your images in the prompt if order matters
Performance
- First load is always slower (downloading/caching)
- Subsequent runs with cached models are much faster
- Batch multiple images when possible instead of separate inferences
๐ ๏ธ Development
Requirements
- Python 3.8+
- PyTorch 2.0+
- CUDA 11.8+ (for GPU acceleration)
- 8GB+ VRAM recommended (4GB minimum with quantization)
Building from Source
git clone https://github.com/YOUR_USERNAME/ComfyUI-QwenVL-MultiImage.git
cd ComfyUI-QwenVL-MultiImage
pip install -r requirements.txt
๐ Credits
- Qwen Team (Alibaba Cloud): For developing the Qwen-VL models
- ComfyUI: For the excellent node-based interface
- 1038lab/ComfyUI-QwenVL: For the original QwenVL node implementation that inspired this project
๐ License
This project is licensed under the GPL-3.0 License. See the LICENSE file for details.
๐ค Contributing
Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.
- Fork the repository
- Create your feature branch (git checkout -b feature/AmazingFeature)
- Commit your changes (git commit -m 'Add some AmazingFeature')
- Push to the branch (git push origin feature/AmazingFeature)
- Open a Pull Request
๐ฎ Support
If you encounter any issues or have questions:
- Check the Troubleshooting section
- Search existing Issues
- Open a new issue with detailed information
โญ Star History
If you find this project useful, please consider giving it a star!
Note: Replace YOUR_USERNAME with your actual GitHub username before publishing.