ComfyUI Extension: ComfyUI-GGUF-VisionLM

Authored by walke2019

Created 2 months ago

Updated 2 months ago

0 stars

ComfyUI nodes for running GGUF quantized Qwen2.5-VL models using llama.cpp

Custom Nodes (0)

README

ComfyUI-GGUF-VisionLM

Run GGUF Vision Language Models locally in ComfyUI. Text, image, and video analysis with low memory. Supports Qwen2.5-VL, Qwen3-VL, Llama 3, LLaVA, MiniCPM-V, Moondream, and more VLMs on consumer hardware.

✨ Features

🚀 Local Execution - Run VLMs completely locally on your hardware
💾 Low Memory - GGUF quantization (Q4_K_M, Q8_0, etc.)
🎯 Multi-Modal - Text generation, image analysis, video understanding
📦 Auto Download - One-click model download from dropdown
🤖 Smart Matching - Automatic mmproj detection and download
⚙️ YAML Config - Easy model management without code changes
🔄 Batch Processing - Process multiple images at once

📦 Supported Models

Vision Language Models

Qwen Series: 2.5-VL (3B/7B), 3-VL (8B)
Llama 3: Vision models
LLaVA: Multiple variants
MiniCPM-V: Lightweight VLMs
Moondream: Ultra-light models
And more...

Total: 8+ model families, 16+ variants

🚀 Quick Start

1. Installation

cd ComfyUI/custom_nodes
git clone https://github.com/walke2019/ComfyUI-GGUF-VisionLM
cd ComfyUI-GGUF-VisionLM
pip install -r requirements.txt

2. Install llama-cpp-python

CUDA (NVIDIA GPU):

CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python

CPU only:

pip install llama-cpp-python

Metal (macOS):

CMAKE_ARGS="-DGGML_METAL=on" pip install llama-cpp-python

3. Usage

Restart ComfyUI
Add node: 🔥 Qwen2.5-VL GGUF (All-in-One)
Select model from dropdown:
- Local models: Direct filename
- Downloadable: [⬇️ Series] filename
Connect image input and execute

That's it! Models auto-download on first use.

📖 Nodes

🔥 Qwen2.5-VL GGUF (All-in-One)

All-in-one node combining model loading and inference.

Key Parameters:

model - Select from dropdown (local or downloadable)
image - Input image
prompt - Description prompt (default: "Describe this image in detail.")
max_tokens - Max generation length (512)
temperature - Sampling temperature (0.7)
n_ctx - Context window (4096)
n_gpu_layers - GPU layers (-1 = all)

Output:

description - Generated text

Other Nodes

Load Qwen2.5-VL GGUF Model - Separate model loading
Qwen2.5-VL GGUF Describe Image - Single image description
Qwen2.5-VL GGUF Batch Describe - Batch processing

🔧 Configuration

Adding New Models

Edit model_registry.yaml:

vision_language_models:
  your-series:
    series_name: "Your Series"
    business_type: "image_analysis"
    models:
      - model_name: "Your-Model-7B"
        repo: "user/repo-GGUF"
        mmproj: "mmproj-file.gguf"
        variants:
          - name: "Q4_K_M"
            file: "model-q4.gguf"
            size: "~4GB"
            recommended: true

Restart ComfyUI to see new models.

Model Registry

All models configured in model_registry.yaml:

Image Analysis - Vision Language Models
Text Generation - Text-only models
Video Analysis - Video understanding models

💡 Tips

Performance

Use Q4_K_M for best quality/speed balance
Set n_gpu_layers=-1 for full GPU usage
Enable Flash Attention for faster inference

Network

First download may take time (models are large)
Use proxy if needed for HuggingFace access
Downloads are cached and resumable

Troubleshooting

Model not showing?

Check model_registry.yaml syntax
Run python3 test_registry.py to verify
Restart ComfyUI

Download failed?

Check network connection
Verify HuggingFace repo URL
Check available disk space

mmproj not found?

System auto-downloads missing mmproj
Or manually specify in mmproj_path parameter

📚 Documentation

Quick Start: See above
Model Registry: Edit model_registry.yaml
Testing: Run python3 test_registry.py

🤝 Contributing

Contributions welcome! To add a new model:

Edit model_registry.yaml
Add model configuration
Test with test_registry.py
Submit PR

📄 License

MIT License - see LICENSE file

🙏 Acknowledgments

llama.cpp - GGUF support
llama-cpp-python - Python bindings
Qwen Team - Excellent VLMs
ComfyUI Community

🔗 Links

Made with ❤️ for the ComfyUI community

If you find this useful, please ⭐ star the repo!