ComfyUI Extension: GLM-4V Image Descriptor
Professional AI Image Description Generator Based on Zhipu AI GLM-4V multimodal model, batch generate accurate and detailed descriptions for images in Chinese and English
Custom Nodes (0)
README
GLM-4V Image Descriptor
π Professional AI Image Description Generator
Based on Zhipu AI GLM-4V multimodal model, batch generate accurate and detailed descriptions for images in Chinese and EnglishThe author's WeChat account is:linyu9418 and linjian257
β¨ Core Features
- π― Smart Batch Processing - Process entire folders of images with one click
- π Same-name File Output - Automatically generate txt description files with same names as images
- β‘ High Performance Optimization - Supports 4-bit quantization, GPU acceleration, and memory management optimization
- π§ Flexible Configuration - Customizable prompts, model selection, and output formats
- π Bilingual Support - Supports Chinese and English interfaces and documentation
- π Multiple Output Formats - Supports TXT, JSON, CSV and other output formats
- π ComfyUI Integration - Fully compatible with ComfyUI workflows
πΌοΈ Demo
Output Example
Input Image: sunset_beach.jpg
Output File: sunset_beach.txt
Content: "A breathtaking sunset scene over a serene beach with golden sand, where gentle waves lap against the shore while vibrant orange and pink hues paint the sky, creating a peaceful and romantic atmosphere."
π Quick Start
System Requirements
- Python: 3.8 or higher
- GPU: Recommended NVIDIA GPU with 8GB+ VRAM
- System: Windows / Linux / macOS
- Memory: 16GB+ system memory
Installation Steps
- Clone the Repository
git clone https://github.com/your-username/ComfyUI_GLM4V_voltspark.git
cd ComfyUI_GLM4V_voltspark
- Install Dependencies
pip install -r requirements.txt
or
your_path\python_embeded\python.exe -m pip install -r your_path\custom_nodes\ComfyUI_GLM4V_voltspark\requirements.txt --index-url https://mirrors.aliyun.com/pypi/simple
# Upgrade transformers library to latest version (recommended)
python -m pip install git+https://github.com/huggingface/transformers.git
# Or install specific version
pip install transformers==4.54.0
- Model Download and Installation
Method 1: Domestic Download (Recommended for China)
glmv4_4bit Model:
- π Download Link: Baidu Netdisk
- π Extraction Code:
qbdq
- π¦ File Name:
glmv4_4bit.7z
- π Extract Path:
ComfyUI/models/glmv4_4bit/
GLM-4.1V-9B-Thinking Model:
- π Download Link: Baidu Netdisk
- π Extraction Code:
9n27
- π¦ File Name:
GLM-4.1V-9B-Thinking.rar
- π Extract Path:
ComfyUI/models/GLM-4.1V-9B-Thinking/
Method 2: Auto Download
The model will be automatically downloaded from Hugging Face on first run. Ensure stable internet connection.
- Plugin Installation
# Extract the plugin to ComfyUI's custom_nodes directory
ComfyUI/custom_nodes/ComfyUI_GLM4V_voltspark/
π Complete Installation Directory Structure
After installation, your directory structure should look like this:
ComfyUI/
βββ models/
β βββ glmv4_4bit/ # GLM-4V 4-bit quantized model directory
β β βββ config.json
β β βββ modeling_chatglm.py
β β βββ pytorch_model-*.bin
β β βββ ...
β βββ GLM-4.1V-9B-Thinking/ # GLM-4.1V full model directory
β βββ config.json
β βββ modeling_chatglm.py
β βββ pytorch_model-*.bin
β βββ ...
βββ custom_nodes/
βββ ComfyUI_GLM4V_voltspark/ # This plugin directory
βββ glm4v.py
βββ __init__.py
βββ requirements.txt
βββ README.md
βββ ...
Usage
ComfyUI Nodes
-
Node Types
GLM-4V Generate
- Single image processingGLM-4V Batch Generate
- Batch image processing
-
Usage Steps
- Search for "GLM-4V" in ComfyUI
- Add nodes to workflow
- Configure input parameters
- Execute workflow
π Detailed Configuration
Preset Parameters
| Parameter | Default Value | Description |
|-----------|---------------|-------------|
| Prompt | describe this image,Describe in long sentence form, without using Markdown format.
| Optimized preset prompt |
| Model | glmv4_4bit
| 4-bit quantized GLM-4V model |
| Unload Policy | Never
| Keep model loaded |
| Output Format | TXT
| Plain text format output |
| Max Images | 100
| Maximum images per batch |
Supported Image Formats
- π· Common Formats: JPG, JPEG, PNG, BMP
- π¨ Professional Formats: TIFF, WEBP
- π Resolution: Supports various resolutions with automatic optimization
Model Selection
| Model Name | Size | Features | Recommended Use |
|------------|------|----------|-----------------|
| glmv4_4bit
| ~6.7GB | 4-bit quantization, fast speed | Daily batch processing |
| GLM-4.1V-9B-Thinking
| ~12.0GB | Full precision, high quality | High-quality descriptions |
π οΈ Advanced Features
Custom Prompts
You can customize prompts according to your needs:
# Detailed description mode
"Please provide a detailed description of this image, including objects, colors, composition, mood, and artistic style."
# Brief description mode
"Describe this image briefly and accurately."
# Professional photography mode
"Analyze this image from a photographer's perspective, describing composition, lighting, and technical aspects."
Batch Processing Options
- β Auto Save - Generate same-name txt files for each image
- π Overwrite Mode - Choose whether to overwrite existing files
- π Progress Monitoring - Real-time progress display and statistics
- π Interrupt & Resume - Support pause and resume processing
Output Format Options
- TXT Format - Plain text descriptions with same names as images
- JSON Format - Structured data with metadata
- CSV Format - Tabular data for easy analysis
π Project Structure
ComfyUI_GLM4V_voltspark/
βββ π glm4v.py # ComfyUI node core implementation
βββ π§ requirements.txt # Python dependencies list
βββ π __init__.py # ComfyUI plugin registration file
βββ π δ½Ώη¨θ―΄ζ.md # Detailed usage documentation
βββ π README_CN.md # Chinese documentation
βββ π README.md # English documentation
βββ π Example/ # Example files and workflows
βββ εεΎεζ¨-ζΉιζζ .json # ComfyUI workflow example
βββ εεΎεζ¨-ζΉιζζ .png # Workflow screenshot
π Troubleshooting
Common Issues & Solutions
Q: Application fails to start
A: Check Python version and dependency installation
python --version # Ensure Python 3.8+
pip install -r requirements.txt --upgrade
Q: Model download fails
A: Check network connection, try using mirror sources
# Set Hugging Face mirror
export HF_ENDPOINT=https://hf-mirror.com
Q: GPU memory insufficient
A: Use 4-bit quantized model or adjust batch size
- Select
glmv4_4bit
model - Reduce maximum number of images processed
- Close other GPU applications
Q: Slow processing speed
A: Optimize settings to improve performance
- Ensure GPU acceleration is used
- Set unload policy to "Never"
- Check CUDA driver version
Log Analysis
The program displays detailed log information during runtime:
- β Success: Green status, normal operation
- β οΈ Warning: Yellow status, needs attention
- β Error: Red status, needs handling
π€ Contributing
We welcome community contributions! Ways to participate:
- π΄ Fork the repository
- π§ Create feature branch (
git checkout -b feature/amazing-feature
) - πΎ Commit changes (
git commit -m 'Add amazing feature'
) - π€ Push branch (
git push origin feature/amazing-feature
) - π Create Pull Request
Development Environment Setup
# Install development dependencies
pip install -r requirements.txt
pip install jupyter matplotlib tqdm # Optional development tools
# Run tests
python -m pytest tests/ # If test files exist
# Code formatting
black . --line-length 88
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- Zhipu AI - For providing the GLM-4V multimodal model
- Hugging Face - For model hosting and inference framework
- ComfyUI - For the powerful AI workflow platform
- All developers who contribute to the open source community
π Contact Us
- π Bug Reports: Issues Page
- π§ Email Contact: [email protected]
π Changelog
v0.3.42 (Latest Version)
- β Complete GUI interface design
- β Optimized batch processing performance
- β Stable model loading mechanism
- β Complete Chinese and English documentation
Coming Soon
- π More model support
- π¨ Interface theme customization
- π Advanced data analysis features
- π Extended multilingual support
<div align="center"> <p><strong>β If this project helps you, please give us a star! β</strong></p> <p>Made with β€οΈ by the Community</p> </div>