ComfyUI Extension: GLM-4V Image Descriptor

Authored by linjian-ufo

Created

Updated

1 stars

Professional AI Image Description Generator Based on Zhipu AI GLM-4V multimodal model, batch generate accurate and detailed descriptions for images in Chinese and English

Custom Nodes (0)

    README

    GLM-4V Image Descriptor

    GLM-4V Logo Python Version License ComfyUI

    πŸš€ Professional AI Image Description Generator
    Based on Zhipu AI GLM-4V multimodal model, batch generate accurate and detailed descriptions for images in Chinese and English

    The author's WeChat account is:linyu9418 and linjian257

    ✨ Core Features

    • 🎯 Smart Batch Processing - Process entire folders of images with one click
    • πŸ“ Same-name File Output - Automatically generate txt description files with same names as images
    • ⚑ High Performance Optimization - Supports 4-bit quantization, GPU acceleration, and memory management optimization
    • πŸ”§ Flexible Configuration - Customizable prompts, model selection, and output formats
    • 🌐 Bilingual Support - Supports Chinese and English interfaces and documentation
    • πŸ“Š Multiple Output Formats - Supports TXT, JSON, CSV and other output formats
    • πŸ”Œ ComfyUI Integration - Fully compatible with ComfyUI workflows

    πŸ–ΌοΈ Demo

    Output Example

    Input Image: sunset_beach.jpg
    Output File: sunset_beach.txt
    Content: "A breathtaking sunset scene over a serene beach with golden sand, where gentle waves lap against the shore while vibrant orange and pink hues paint the sky, creating a peaceful and romantic atmosphere."

    ε•ε›ΎεζŽ¨-批量打标

    πŸš€ Quick Start

    System Requirements

    • Python: 3.8 or higher
    • GPU: Recommended NVIDIA GPU with 8GB+ VRAM
    • System: Windows / Linux / macOS
    • Memory: 16GB+ system memory

    Installation Steps

    1. Clone the Repository
    git clone https://github.com/your-username/ComfyUI_GLM4V_voltspark.git
    cd ComfyUI_GLM4V_voltspark
    
    1. Install Dependencies
    pip install -r requirements.txt
    or
    your_path\python_embeded\python.exe -m pip install -r your_path\custom_nodes\ComfyUI_GLM4V_voltspark\requirements.txt --index-url https://mirrors.aliyun.com/pypi/simple
    
    # Upgrade transformers library to latest version (recommended)
    python -m pip install git+https://github.com/huggingface/transformers.git
    
    # Or install specific version
    pip install transformers==4.54.0
    
    1. Model Download and Installation

    Method 1: Domestic Download (Recommended for China)

    glmv4_4bit Model:

    • πŸ“ Download Link: Baidu Netdisk
    • πŸ”‘ Extraction Code: qbdq
    • πŸ“¦ File Name: glmv4_4bit.7z
    • πŸ“‚ Extract Path: ComfyUI/models/glmv4_4bit/

    GLM-4.1V-9B-Thinking Model:

    • πŸ“ Download Link: Baidu Netdisk
    • πŸ”‘ Extraction Code: 9n27
    • πŸ“¦ File Name: GLM-4.1V-9B-Thinking.rar
    • πŸ“‚ Extract Path: ComfyUI/models/GLM-4.1V-9B-Thinking/

    Method 2: Auto Download

    The model will be automatically downloaded from Hugging Face on first run. Ensure stable internet connection.

    1. Plugin Installation
    # Extract the plugin to ComfyUI's custom_nodes directory
    ComfyUI/custom_nodes/ComfyUI_GLM4V_voltspark/
    

    πŸ“‚ Complete Installation Directory Structure

    After installation, your directory structure should look like this:

    ComfyUI/
    β”œβ”€β”€ models/
    β”‚   β”œβ”€β”€ glmv4_4bit/                    # GLM-4V 4-bit quantized model directory
    β”‚   β”‚   β”œβ”€β”€ config.json
    β”‚   β”‚   β”œβ”€β”€ modeling_chatglm.py
    β”‚   β”‚   β”œβ”€β”€ pytorch_model-*.bin
    β”‚   β”‚   └── ...
    β”‚   └── GLM-4.1V-9B-Thinking/          # GLM-4.1V full model directory
    β”‚       β”œβ”€β”€ config.json
    β”‚       β”œβ”€β”€ modeling_chatglm.py
    β”‚       β”œβ”€β”€ pytorch_model-*.bin
    β”‚       └── ...
    └── custom_nodes/
        └── ComfyUI_GLM4V_voltspark/       # This plugin directory
            β”œβ”€β”€ glm4v.py
            β”œβ”€β”€ __init__.py
            β”œβ”€β”€ requirements.txt
            β”œβ”€β”€ README.md
            └── ...
    

    Usage

    ComfyUI Nodes

    1. Node Types

      • GLM-4V Generate - Single image processing
      • GLM-4V Batch Generate - Batch image processing
    2. Usage Steps

      • Search for "GLM-4V" in ComfyUI
      • Add nodes to workflow
      • Configure input parameters
      • Execute workflow

    πŸ“‹ Detailed Configuration

    Preset Parameters

    | Parameter | Default Value | Description | |-----------|---------------|-------------| | Prompt | describe this image,Describe in long sentence form, without using Markdown format. | Optimized preset prompt | | Model | glmv4_4bit | 4-bit quantized GLM-4V model | | Unload Policy | Never | Keep model loaded | | Output Format | TXT | Plain text format output | | Max Images | 100 | Maximum images per batch |

    Supported Image Formats

    • πŸ“· Common Formats: JPG, JPEG, PNG, BMP
    • 🎨 Professional Formats: TIFF, WEBP
    • πŸ“ Resolution: Supports various resolutions with automatic optimization

    Model Selection

    | Model Name | Size | Features | Recommended Use | |------------|------|----------|-----------------| | glmv4_4bit | ~6.7GB | 4-bit quantization, fast speed | Daily batch processing | | GLM-4.1V-9B-Thinking | ~12.0GB | Full precision, high quality | High-quality descriptions |

    πŸ› οΈ Advanced Features

    Custom Prompts

    You can customize prompts according to your needs:

    # Detailed description mode
    "Please provide a detailed description of this image, including objects, colors, composition, mood, and artistic style."
    
    # Brief description mode  
    "Describe this image briefly and accurately."
    
    # Professional photography mode
    "Analyze this image from a photographer's perspective, describing composition, lighting, and technical aspects."
    

    Batch Processing Options

    • βœ… Auto Save - Generate same-name txt files for each image
    • πŸ”„ Overwrite Mode - Choose whether to overwrite existing files
    • πŸ“Š Progress Monitoring - Real-time progress display and statistics
    • πŸ›‘ Interrupt & Resume - Support pause and resume processing

    Output Format Options

    1. TXT Format - Plain text descriptions with same names as images
    2. JSON Format - Structured data with metadata
    3. CSV Format - Tabular data for easy analysis

    πŸ“ Project Structure

    ComfyUI_GLM4V_voltspark/
    β”œβ”€β”€ πŸ“„ glm4v.py                     # ComfyUI node core implementation
    β”œβ”€β”€ πŸ”§ requirements.txt             # Python dependencies list
    β”œβ”€β”€ πŸ”Œ __init__.py                  # ComfyUI plugin registration file
    β”œβ”€β”€ πŸ“– δ½Ώη”¨θ―΄ζ˜Ž.md                  # Detailed usage documentation
    β”œβ”€β”€ πŸ“– README_CN.md                 # Chinese documentation
    β”œβ”€β”€ πŸ“– README.md                    # English documentation
    └── πŸ“ Example/                     # Example files and workflows
        β”œβ”€β”€ ε•ε›ΎεζŽ¨-批量打标.json        # ComfyUI workflow example
        └── ε•ε›ΎεζŽ¨-批量打标.png         # Workflow screenshot
    

    πŸ› Troubleshooting

    Common Issues & Solutions

    Q: Application fails to start

    A: Check Python version and dependency installation

    python --version  # Ensure Python 3.8+
    pip install -r requirements.txt --upgrade
    

    Q: Model download fails

    A: Check network connection, try using mirror sources

    # Set Hugging Face mirror
    export HF_ENDPOINT=https://hf-mirror.com
    

    Q: GPU memory insufficient

    A: Use 4-bit quantized model or adjust batch size

    • Select glmv4_4bit model
    • Reduce maximum number of images processed
    • Close other GPU applications

    Q: Slow processing speed

    A: Optimize settings to improve performance

    • Ensure GPU acceleration is used
    • Set unload policy to "Never"
    • Check CUDA driver version

    Log Analysis

    The program displays detailed log information during runtime:

    • βœ… Success: Green status, normal operation
    • ⚠️ Warning: Yellow status, needs attention
    • ❌ Error: Red status, needs handling

    🀝 Contributing

    We welcome community contributions! Ways to participate:

    1. 🍴 Fork the repository
    2. πŸ”§ Create feature branch (git checkout -b feature/amazing-feature)
    3. πŸ’Ύ Commit changes (git commit -m 'Add amazing feature')
    4. πŸ“€ Push branch (git push origin feature/amazing-feature)
    5. πŸ”„ Create Pull Request

    Development Environment Setup

    # Install development dependencies
    pip install -r requirements.txt
    pip install jupyter matplotlib tqdm  # Optional development tools
    
    # Run tests
    python -m pytest tests/  # If test files exist
    
    # Code formatting
    black . --line-length 88
    

    πŸ“„ License

    This project is licensed under the MIT License - see the LICENSE file for details.

    πŸ™ Acknowledgments

    • Zhipu AI - For providing the GLM-4V multimodal model
    • Hugging Face - For model hosting and inference framework
    • ComfyUI - For the powerful AI workflow platform
    • All developers who contribute to the open source community

    πŸ“ž Contact Us

    πŸ“ˆ Changelog

    v0.3.42 (Latest Version)

    • βœ… Complete GUI interface design
    • βœ… Optimized batch processing performance
    • βœ… Stable model loading mechanism
    • βœ… Complete Chinese and English documentation

    Coming Soon

    • πŸ”„ More model support
    • 🎨 Interface theme customization
    • πŸ“Š Advanced data analysis features
    • 🌐 Extended multilingual support

    <div align="center"> <p><strong>⭐ If this project helps you, please give us a star! ⭐</strong></p> <p>Made with ❀️ by the Community</p> </div>