ComfyUI Extension: ComfyUI-GGUF-VisionLM
ComfyUI nodes for running GGUF quantized Qwen2.5-VL models using llama.cpp
Custom Nodes (0)
README
ComfyUI-GGUF-VisionLM
Run GGUF Vision Language Models locally in ComfyUI. Text, image, and video analysis with low memory. Supports Qwen2.5-VL, Qwen3-VL, Llama 3, LLaVA, MiniCPM-V, Moondream, and more VLMs on consumer hardware.
✨ Features
- 🚀 Local Execution - Run VLMs completely locally on your hardware
- 💾 Low Memory - GGUF quantization (Q4_K_M, Q8_0, etc.)
- 🎯 Multi-Modal - Text generation, image analysis, video understanding
- 📦 Auto Download - One-click model download from dropdown
- 🤖 Smart Matching - Automatic mmproj detection and download
- ⚙️ YAML Config - Easy model management without code changes
- 🔄 Batch Processing - Process multiple images at once
📦 Supported Models
Vision Language Models
- Qwen Series: 2.5-VL (3B/7B), 3-VL (8B)
- Llama 3: Vision models
- LLaVA: Multiple variants
- MiniCPM-V: Lightweight VLMs
- Moondream: Ultra-light models
- And more...
Total: 8+ model families, 16+ variants
🚀 Quick Start
1. Installation
cd ComfyUI/custom_nodes
git clone https://github.com/walke2019/ComfyUI-GGUF-VisionLM
cd ComfyUI-GGUF-VisionLM
pip install -r requirements.txt
2. Install llama-cpp-python
CUDA (NVIDIA GPU):
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python
CPU only:
pip install llama-cpp-python
Metal (macOS):
CMAKE_ARGS="-DGGML_METAL=on" pip install llama-cpp-python
3. Usage
- Restart ComfyUI
- Add node: 🔥 Qwen2.5-VL GGUF (All-in-One)
- Select model from dropdown:
- Local models: Direct filename
- Downloadable:
[⬇️ Series] filename
- Connect image input and execute
That's it! Models auto-download on first use.
📖 Nodes
🔥 Qwen2.5-VL GGUF (All-in-One)
All-in-one node combining model loading and inference.
Key Parameters:
model
- Select from dropdown (local or downloadable)image
- Input imageprompt
- Description prompt (default: "Describe this image in detail.")max_tokens
- Max generation length (512)temperature
- Sampling temperature (0.7)n_ctx
- Context window (4096)n_gpu_layers
- GPU layers (-1 = all)
Output:
description
- Generated text
Other Nodes
- Load Qwen2.5-VL GGUF Model - Separate model loading
- Qwen2.5-VL GGUF Describe Image - Single image description
- Qwen2.5-VL GGUF Batch Describe - Batch processing
🔧 Configuration
Adding New Models
Edit model_registry.yaml
:
vision_language_models:
your-series:
series_name: "Your Series"
business_type: "image_analysis"
models:
- model_name: "Your-Model-7B"
repo: "user/repo-GGUF"
mmproj: "mmproj-file.gguf"
variants:
- name: "Q4_K_M"
file: "model-q4.gguf"
size: "~4GB"
recommended: true
Restart ComfyUI to see new models.
Model Registry
All models configured in model_registry.yaml
:
- Image Analysis - Vision Language Models
- Text Generation - Text-only models
- Video Analysis - Video understanding models
💡 Tips
Performance
- Use Q4_K_M for best quality/speed balance
- Set
n_gpu_layers=-1
for full GPU usage - Enable Flash Attention for faster inference
Network
- First download may take time (models are large)
- Use proxy if needed for HuggingFace access
- Downloads are cached and resumable
Troubleshooting
Model not showing?
- Check
model_registry.yaml
syntax - Run
python3 test_registry.py
to verify - Restart ComfyUI
Download failed?
- Check network connection
- Verify HuggingFace repo URL
- Check available disk space
mmproj not found?
- System auto-downloads missing mmproj
- Or manually specify in
mmproj_path
parameter
📚 Documentation
- Quick Start: See above
- Model Registry: Edit
model_registry.yaml
- Testing: Run
python3 test_registry.py
🤝 Contributing
Contributions welcome! To add a new model:
- Edit
model_registry.yaml
- Add model configuration
- Test with
test_registry.py
- Submit PR
📄 License
MIT License - see LICENSE file
🙏 Acknowledgments
- llama.cpp - GGUF support
- llama-cpp-python - Python bindings
- Qwen Team - Excellent VLMs
- ComfyUI Community
🔗 Links
Made with ❤️ for the ComfyUI community
If you find this useful, please ⭐ star the repo!