ComfyUI Extension: Qwen2.5-VL GGUF Nodes

Authored by walke2019

Created

Updated

0 stars

ComfyUI nodes for running GGUF quantized Qwen2.5-VL models using llama.cpp

Custom Nodes (0)

    README

    ComfyUI-GGUF-VLM

    Complete GGUF model support for ComfyUI with local and Nexa SDK inference modes.

    🌟 Features

    Two Core Capabilities

    1. šŸ’¬ Text Models - Text-to-Text generation
      • Qwen3, LLaMA3, DeepSeek-R1, Mistral, etc.
      • Local GGUF models or Remote API services
    2. šŸ–¼ļø Vision Models - Image-Text-to-Text analysis
      • Qwen2.5-VL, Qwen3-VL, LLaVA, MiniCPM-V, etc.
      • Single image, multi-image comparison, video analysis

    Key Features

    • āœ… Unified interface - Simple node structure by model capability
    • āœ… Multiple backends - GGUF (llama-cpp), Transformers, Remote API
    • āœ… Auto model detection - Smart model loading and compatibility
    • āœ… Thinking mode support - DeepSeek-R1, Qwen3-Thinking
    • āœ… Multi-image analysis - Compare up to 6 images simultaneously
    • āœ… Device optimization - CUDA, MPS, CPU with auto-detection

    šŸ¤– Supported Models

    šŸ’¬ Text Models (Text-to-Text)

    Qwen Series:

    • Qwen3, Qwen2.5, Qwen-Chat
    • Qwen3-Thinking (with thinking mode)

    LLaMA Series:

    • LLaMA-3.x, LLaMA-2
    • Mistral, Mixtral

    Other Models:

    • DeepSeek-R1 (with thinking mode)
    • Phi-3, Gemma, Yi

    šŸ–¼ļø Vision Models (Image-Text-to-Text)

    Qwen-VL Series:

    • Qwen2.5-VL (recommended)
    • Qwen3-VL

    LLaVA Series:

    • LLaVA-1.5, LLaVA-1.6
    • LLaVA-NeXT

    Other Vision Models:

    • MiniCPM-V-2.6
    • Phi-3-Vision
    • InternVL

    šŸ’” Note: Models must be in GGUF format for local inference, or accessible via Nexa/Ollama API for remote inference.

    šŸ“¦ Installation

    1. Install ComfyUI Custom Node

    cd ComfyUI/custom_nodes
    git clone https://github.com/walke2019/ComfyUI-GGUF-VLM.git
    cd ComfyUI-GGUF-VLM
    pip install -r requirements.txt
    

    2. For Nexa SDK Mode (Optional)

    # Install Nexa SDK
    pip install nexaai
    
    # Start Nexa service
    nexa serve
    

    The service will be available at http://127.0.0.1:11434

    šŸš€ Quick Start

    Using Text Generation (Local GGUF)

    Recommended for local GGUF files

    [Text Model Loader]
    ā”œā”€ model: Select your GGUF file
    └─ device: cuda/cpu/mps
        ↓
    [Text Generation]
    ā”œā”€ max_tokens: 256  ← Recommended for single paragraph
    ā”œā”€ temperature: 0.7
    ā”œā”€ top_p: 0.8
    ā”œā”€ top_k: 40
    ā”œā”€ repetition_penalty: 1.1
    ā”œā”€ enable_thinking: False
    └─ prompt: "Your prompt here"
        ↓
    Output: context, thinking
    

    Features:

    • āœ… Direct file access
    • āœ… No service required
    • āœ… Fast and simple
    • āœ… Stop sequences prevent over-generation
    • āœ… Automatic paragraph merging

    Using Nexa SDK Mode

    Recommended for Nexa SDK ecosystem

    Step 1: Download Model

    # Download a model using Nexa CLI
    nexa pull mradermacher/Huihui-Qwen3-4B-Instruct-2507-abliterated-GGUF:Q8_0 --model-type llm
    
    # Check downloaded models
    nexa list
    

    Step 2: Use in ComfyUI

    [Nexa Model Selector]
    ā”œā”€ base_url: http://127.0.0.1:11434
    ā”œā”€ refresh_models: ☐
    └─ system_prompt: (optional)
        ↓
    [Nexa SDK Text Generation]
    ā”œā”€ preset_model: Select from dropdown (auto-populated)
    ā”œā”€ max_tokens: 256
    ā”œā”€ temperature: 0.7
    └─ prompt: "Your prompt here"
        ↓
    Output: context, thinking
    

    Features:

    • āœ… Centralized model management
    • āœ… Auto-populated model list
    • āœ… Supports nexa pull workflow

    šŸ“‹ Available Nodes

    Text Generation Nodes (Local GGUF)

    šŸ”· Text Model Loader

    Load GGUF models from /workspace/ComfyUI/models/LLM/GGUF/

    Parameters:

    • model: Select from available GGUF files
    • device: cuda/cpu/mps
    • n_ctx: Context window (default: 8192)
    • n_gpu_layers: GPU layers (-1 for all)

    Output:

    • model: Model configuration

    šŸ”· Text Generation

    Generate text with loaded GGUF model

    Parameters:

    • model: From Text Model Loader
    • max_tokens: Maximum tokens (1-8192, recommended: 256)
    • temperature: Temperature (0.0-2.0)
    • top_p: Top-p sampling (0.0-1.0)
    • top_k: Top-k sampling (0-100)
    • repetition_penalty: Repetition penalty (1.0-2.0)
    • enable_thinking: Enable thinking mode
    • prompt: Input prompt (at bottom for easy editing)

    Outputs:

    • context: Generated text
    • thinking: Thinking process (if enabled)

    Features:

    • āœ… Stop sequences: ["User:", "System:", "\n\n\n", "\n\n##", "\n\nNote:", "\n\nThis "]
    • āœ… Automatic paragraph merging for single-paragraph prompts
    • āœ… Detailed console logging

    Nexa SDK Nodes

    šŸ”· Nexa Model Selector

    Configure Nexa SDK service

    Parameters:

    • base_url: Service URL (default: http://127.0.0.1:11434)
    • refresh_models: Refresh model list
    • system_prompt: System prompt (optional)

    Output:

    • model_config: Configuration for Text Generation

    šŸ”· Nexa SDK Text Generation

    Generate text using Nexa SDK

    Parameters:

    • model_config: From Model Selector
    • preset_model: Select from dropdown (auto-populated from nexa list)
    • custom_model: Custom model ID (format: author/model:quant)
    • auto_download: Auto-download if missing
    • max_tokens: Maximum tokens (recommended: 256)
    • temperature, top_p, top_k, repetition_penalty: Generation parameters
    • enable_thinking: Enable thinking mode
    • prompt: Input prompt (at bottom)

    Outputs:

    • context: Generated text
    • thinking: Thinking process (if enabled)

    Preset Models:

    • DavidAU/Qwen3-8B-64k-Josiefied-Uncensored-HORROR-Max-GGUF:Q6_K
    • mradermacher/Huihui-Qwen3-4B-Instruct-2507-abliterated-GGUF:Q8_0
    • prithivMLmods/Qwen3-4B-2507-abliterated-GGUF:Q8_0

    šŸ”· Nexa Service Status

    Check Nexa SDK service status

    Parameters:

    • base_url: Service URL
    • refresh: Refresh model list

    Output:

    • status: Service status and model list

    šŸŽÆ Best Practices

    For Single-Paragraph Output

    System Prompt:

    You are an expert prompt generator. Output ONLY in English.
    
    **CRITICAL: Output EXACTLY ONE continuous paragraph. Maximum 400 words.**
    

    Parameters:

    max_tokens: 256  ← Key setting!
    temperature: 0.7
    top_p: 0.8
    top_k: 20
    

    Why max_tokens=256?

    • āœ… Prevents over-generation
    • āœ… Model completes task without extra commentary
    • āœ… Reduces from ~2700 chars (11 paragraphs) to ~1300 chars (1 paragraph)

    For Multi-Turn Conversations

    Include history directly in prompt:

    User: Hello
    Assistant: Hi! How can I help?
    User: Tell me a joke
    

    No need for separate conversation history parameter.

    šŸ’­ Thinking Mode

    Automatically extracts thinking process from models like DeepSeek-R1 and Qwen3-Thinking.

    Supported Tags:

    • <think>...</think> (DeepSeek-R1, Qwen3)
    • <thinking>...</thinking>
    • [THINKING]...[/THINKING]

    Usage:

    [Text Generation]
    ā”œā”€ enable_thinking: True
    └─ prompt: "Explain your reasoning"
        ↓
    Outputs:
    ā”œā”€ context: Final answer (thinking tags removed)
    └─ thinking: Extracted thinking process
    

    Disable Thinking:

    • Set enable_thinking: False
    • Or add no_think to system prompt

    šŸ“Š Mode Comparison

    | Feature | Text Generation (Local) | Nexa SDK | |---------|------------------------|----------| | Setup | Copy GGUF file | nexa pull | | Service | Not required | Requires nexa serve | | Model Management | Manual | CLI (nexa list, nexa pull) | | Use Case | Local files, production | Nexa ecosystem, shared models | | Speed | Fast | Fast (via service) | | Flexibility | Any GGUF file | Only nexa pull models |

    Recommendation:

    • Use Text Generation for local GGUF files
    • Use Nexa SDK if you're already using Nexa ecosystem

    šŸ› Troubleshooting

    Output Too Long (Multiple Paragraphs)

    Problem: Model generates 11 paragraphs instead of 1

    Solution:

    1. Reduce max_tokens from 512 to 256
    2. Strengthen system prompt: Add "EXACTLY ONE paragraph"
    3. Stop sequences are already configured

    Nexa Service Not Available

    Problem: āŒ Nexa SDK service is not available

    Solution:

    1. Start service: nexa serve
    2. Check: curl http://127.0.0.1:11434/v1/models
    3. Verify URL in node

    Model Not in Dropdown

    Problem: Downloaded model doesn't appear in Nexa SDK dropdown

    Solution:

    1. Check: nexa list
    2. Click "refresh_models" in Nexa Model Selector
    3. Restart ComfyUI

    0B Entries in nexa list

    Problem: nexa list shows 0B entries

    Solution:

    # Clean up invalid entries
    rm -rf ~/.cache/nexa.ai/nexa_sdk/models/local
    rm -rf ~/.cache/nexa.ai/nexa_sdk/models/workspace
    find ~/.cache/nexa.ai/nexa_sdk/models -name "*.lock" -delete
    
    # Verify
    nexa list
    

    šŸ“ Directory Structure

    ComfyUI-GGUF-VLM/
    ā”œā”€ā”€ README.md                       # This file
    ā”œā”€ā”€ requirements.txt                # Dependencies
    ā”œā”€ā”€ __init__.py                     # Node registration
    ā”œā”€ā”€ config/
    │   └── paths.py                    # Path configuration
    ā”œā”€ā”€ core/
    │   ā”œā”€ā”€ inference_engine.py        # GGUF inference engine
    │   ā”œā”€ā”€ model_loader.py            # Model loader
    │   └── inference/
    │       ā”œā”€ā”€ nexa_engine.py         # Nexa SDK engine
    │       └── transformers_engine.py # Transformers engine
    ā”œā”€ā”€ nodes/
    │   ā”œā”€ā”€ text_node.py               # Text Generation nodes
    │   ā”œā”€ā”€ nexa_text_node.py          # Nexa SDK nodes
    │   ā”œā”€ā”€ vision_node.py             # Vision nodes
    │   └── system_prompt_node.py      # System prompt config
    └── utils/
        ā”œā”€ā”€ device_optimizer.py        # Device optimization
        └── system_prompts.py          # System prompt presets
    

    šŸ”„ Recent Updates

    v2.2 (2025-10-29)

    • āœ… Simplified Nexa Model Selector - Removed unused models_dir and model_source
    • āœ… Removed unused outputs - Cleaner node interface
    • āœ… Moved prompt to bottom - Better UX for long prompts
    • āœ… Removed conversation_history - Use prompt directly
    • āœ… Stop sequences - Prevent over-generation
    • āœ… Paragraph merging - Clean single-paragraph output
    • āœ… Dynamic model list - Auto-populated from Nexa SDK API
    • āœ… Detailed logging - Debug-friendly console output

    v2.1

    • āœ… Nexa SDK integration
    • āœ… Preset model list
    • āœ… Thinking mode support

    v2.0

    • āœ… GGUF mode with llama-cpp-python
    • āœ… ComfyUI /models/LLM integration

    šŸ“ Requirements

    llama-cpp-python>=0.2.0
    transformers>=4.30.0
    torch>=2.0.0
    Pillow>=9.0.0
    requests>=2.25.0
    nexaai  # Optional, for Nexa SDK mode
    

    šŸ¤ Contributing

    Contributions are welcome! Please:

    1. Fork the repository
    2. Create a feature branch
    3. Make your changes
    4. Test thoroughly
    5. Submit a pull request

    šŸ“„ License

    MIT License - see LICENSE file for details

    • Nexa SDK: https://github.com/NexaAI/nexa-sdk
    • ComfyUI: https://github.com/comfyanonymous/ComfyUI