ComfyUI Extension: comfyui_Niutonian_GLM_4_6V
This is the transformer-based implementation of Niutonian GLM-4.6V nodes for ComfyUI with memory optimizations.
Custom Nodes (0)
README
Niutonian GLM-4.6V ComfyUI Nodes (Transformer Version)
This is the transformer-based implementation of Niutonian GLM-4.6V nodes for ComfyUI with extensive memory optimizations to prevent CUDA out-of-memory errors.
Version: v0.1
Features
- Memory Optimized: Multiple strategies to reduce VRAM usage
- Quantization Support: 4-bit and 8-bit quantization via bitsandbytes
- Automatic Memory Management: CUDA cache clearing and efficient tensor handling
- Error Recovery: Graceful handling of OOM errors with helpful messages
- Niutonian Branding: Professional custom node package with consistent naming
Nodes
1. Niutonian GLM46VLoader
Loads the GLM-4.6V-Flash model with memory optimizations.
Inputs:
device: auto/cuda/cpu (default: auto)torch_dtype: auto/bfloat16/float16/float32 (default: bfloat16)low_cpu_mem_usage: Enable low CPU memory usage (default: True)load_in_8bit: Enable 8-bit quantization (default: False)load_in_4bit: Enable 4-bit quantization (default: True)
Outputs:
GLM_MODEL: Model and processor for other nodes
2. Niutonian GLM46VDescriber
Describes images using the GLM-4.6V vision model.
Inputs:
glm_model: Model from Niutonian GLM46VLoaderimage: Input image tensoruser_prompt: Description prompt (default: "Describe this image in detail.")max_tokens: Maximum output tokens (default: 1024)temperature: Sampling temperature (default: 0.7)
Outputs:
output_text: Clean description textraw_output: Raw model output with thinking tags
3. Niutonian GLM46VAgenticSampler
Advanced KSampler that uses GLM-4.6V to verify generated images.
Inputs:
- Standard KSampler inputs (model, seed, steps, cfg, etc.)
glm_model: GLM model for verificationvae: VAE for decoding latentsverification_prompt: Prompt for image verificationmax_retries: Maximum retry attempts (default: 3)
Outputs:
latent: Final latent representationverified_image: Decoded imageis_match: Boolean indicating if image matches promptsummary: Analysis summary
4. Niutonian GLM46VPromptGenerator
Intelligent prompt generator using GLM-4.6V vision model.
Inputs:
glm_model: Model from Niutonian GLM46VLoadermode: Generation mode (create_from_image, refine_prompt, creative_variations, style_transfer)base_prompt: Base prompt text for refinement modesstyle: Target artistic style (photorealistic, artistic, cinematic, anime, etc.)detail_level: Level of detail (basic, detailed, very_detailed, ultra_detailed)creativity: Creativity factor (0.0-1.0)max_tokens: Maximum output tokensreference_image: Optional reference imagenegative_elements: Elements to avoid in prompts
Outputs:
positive_prompt: Generated positive promptnegative_prompt: Generated negative promptanalysis: Analysis of prompt choices
Memory Optimization Strategies
1. Quantization (Recommended)
Enable 4-bit or 8-bit quantization to significantly reduce VRAM usage:
- 4-bit: ~75% memory reduction, minimal quality loss
- 8-bit: ~50% memory reduction, negligible quality loss
2. Device Mapping
- Uses
device_map="sequential"for efficient GPU memory allocation - Automatically reserves 15% of VRAM for other operations
- Falls back to CPU if GPU memory is insufficient
3. Memory Management
- Automatic CUDA cache clearing before/after operations
- Efficient tensor movement and cleanup
- Gradient checkpointing enabled when available
Installation
- Clone this repository to your ComfyUI
custom_nodesdirectory:
cd /path/to/ComfyUI/custom_nodes
git clone https://github.com/Niutonian/comfyui_Niutonian_GLM_4_6V.git
- Install dependencies:
cd comfyui_Niutonian_GLM_4_6V
pip install -r requirements.txt
- Restart ComfyUI to load the new nodes
Usage Tips
For Low VRAM Systems (8-12GB)
- Enable 4-bit quantization in Niutonian GLM46VLoader
- Use
torch_dtype="float16" - Set
low_cpu_mem_usage=True
For Medium VRAM Systems (16-24GB)
- Try 8-bit quantization first
- Fall back to 4-bit if needed
- Use
torch_dtype="float16"
For Medium VRAM Systems (24-32GB)
- Try 8-bit quantization first
- Fall back to 4-bit if needed
- Use
torch_dtype="bfloat16"
For High VRAM Systems (32GB+)
- Can use without quantization
- Use
torch_dtype="bfloat16"for best performance - Consider
torch_dtype="float16"if bfloat16 causes issues
Troubleshooting
CUDA Out of Memory
- Enable 4-bit quantization
- Reduce
max_tokensin Niutonian GLM46VDescriber - Close other GPU applications
- Restart ComfyUI to clear memory
Model Loading Fails
- Check internet connection (model downloads from HuggingFace)
- Ensure sufficient disk space (~9GB for model)
- Try CPU device if GPU fails
- Check transformers version (>=5.0.0rc0 required)
Slow Performance
- Ensure CUDA is available and working
- Use quantization (4-bit/8-bit) for faster inference
- Reduce image resolution if possible
- Check GPU utilization
Grey or Missing Images
If the image is generated but appears grey or not showing properly:
- Reduce the image size to 1024x1024 or lower
- Try again with the smaller resolution
- This often resolves display issues with large images
Testing
Run the memory test script to validate your setup:
python test_memory.py
This will test different quantization configurations and report memory usage.
Requirements
- Python 3.8+
- PyTorch 2.0+
- transformers 5.0.0rc0+
- bitsandbytes 0.41.0+ (for quantization)
- CUDA-capable GPU (recommended)
- 8GB+ VRAM (with quantization) or 24GB+ VRAM (without quantization)
Model Information
- Model: zai-org/GLM-4.6V-Flash
- Size: ~9B parameters
- Context: 128K tokens
- Vision: Supports image + text input
- License: Check model repository for licensing terms
About Niutonian
This package is part of the Niutonian suite of AI tools, providing professional-grade implementations of cutting-edge AI models for creative workflows.
Version: v0.1
Release Date: January 5, 2026
Repository: Niutonian/comfyui_Niutonian_GLM_4_6V
Version History
See CHANGELOG.md for detailed version history and changes.