ComfyUI Extension: ComfyUI Janus Pro Vision

Authored by ShmuelRonen

Created

Updated

17 stars

A ComfyUI custom node extension that integrates the Janus-Pro-7B vision-language model from DeepSeek AI on your's local computer, enabling powerful image understanding and multi-turn conversation capabilities.

Custom Nodes (0)

    README

    ComfyUI Janus Pro Vision

    A ComfyUI custom node extension that integrates the Janus-Pro-7B vision-language model from DeepSeek AI on your's local computer, enabling powerful image understanding and multi-turn conversation capabilities.

    Vision Mode (One or two images)

    image

    Chat Mode (One or two images)

    Screenshot 2025-01-29 213437

    Features

    • 🖼️ Advanced Image Analysis: Leverages Janus-Pro-7B's capabilities for detailed image understanding and description
    • 💬 Multi-turn Chat: Supports interactive conversations about images with context awareness
    • 🔄 Dual Image Support: Can analyze relationships between two images simultaneously
    • 🚀 Automatic Model Download: Downloads model files automatically on first use
    • ⚙️ Flexible Configuration: Customizable parameters for generation and image processing
    • 🎯 ComfyUI Integration: Seamless integration with ComfyUI workflow

    Installation

    1. Clone this repository into your ComfyUI custom nodes folder:
    cd ComfyUI/custom_nodes
    git clone https://github.com/ShmuelRonen/ComfyUI-Janus_pro_vision.git
    
    1. Install required dependencies:
    pip install requests
    pip install tqdm
    
    1. The model files will be automatically downloaded on first use from DeepSeek's HuggingFace repository.

    2. If automatic model download failes you can download them manualy to models\Janus-Pro folder:

    git clone https://huggingface.co/deepseek-ai/Janus-Pro-7B
    

    Available Nodes

    1. Janus-7b-Pro Model Loader (Upload)

    Handles model loading and management.

    • Input: None (uses default model path)
    • Output: JANUS_MODEL (model object for use in analyzer)

    2. Janus Vision 7b Pro (Chat)

    Main analysis node with chat capabilities.

    Inputs:

    • janus_model: Model object from loader node
    • image_a: Primary image for analysis
    • image_b: (Optional) Secondary image for comparison
    • prompt: Text prompt/question about the image(s)
    • chat_mode: Enable/disable chat functionality
    • seed: Random seed for generation
    • temperature: Generation temperature (0.0 - 2.0)
    • top_p: Top-p sampling parameter (0.0 - 1.0)
    • max_tokens: Maximum generation length
    • image_size: Target image size for processing (512-2048)
    • frame_size: Border thickness for image display (1-10)
    • reset_chat: Clear chat history

    Outputs:

    • response: Model's response text
    • chat_history: Formatted chat history (in chat mode)

    Configuration

    Image Processing Parameters

    • image_size: Controls the maximum dimension while maintaining aspect ratio (default: 1024)

      • Range: 512 to 2048 pixels
      • Steps: 64 pixels
      • Example: If image is 2000x1000px and image_size=1024:
        • Width will be scaled to 1024
        • Height will be scaled proportionally to 512
    • frame_size: Border thickness for visual separation (default: 2)

      • Range: 1 to 10 pixels
      • Example values:
        • frame_size=1: Thin border
        • frame_size=2: Standard border
        • frame_size=5: Thick border
        • frame_size=10: Very thick border

    Generation Parameters

    • temperature: Controls response randomness
      • 0.1: More focused and deterministic
      • 0.7: More creative and varied
    • top_p: Nucleus sampling parameter (0.95 recommended)
    • max_tokens: Maximum length of generated response

    Model Information

    This extension uses the Janus-Pro-7B model from DeepSeek AI, which offers:

    • Strong image understanding capabilities
    • Multi-turn conversation support
    • High-quality natural language generation
    • Support for image comparison and analysis

    Requirements

    • ComfyUI
    • Python 3.8+
    • PyTorch
    • Transformers library
    • requests
    • tqdm

    License

    This project is MIT licensed. The Janus-Pro-7B model has its own license from DeepSeek AI.

    Acknowledgments

    • DeepSeek AI for the Janus-Pro-7B model
    • ComfyUI community for the framework and support

    Contributing

    Contributions are welcome! Please feel free to submit a Pull Request.