ComfyUI Node: Molmo Vision-Language Model
Category
VLM Nodes/Molmo
Inputs
image IMAGE
prompt STRING
model_name
- MolmoE-1B (Efficient)
- Molmo-7B-D (Best 7B)
- Molmo-7B-O (Alternative 7B)
memory_mode
- Full Precision (45GB+ Required)
- 8-bit Quantized (25GB+ Required)
- 4-bit Quantized (15GB+ Required)
- 4-bit + CPU Offload (12GB+ Required)
max_new_tokens INT
temperature FLOAT
top_p FLOAT
top_k INT
use_autocast BOOLEAN
Outputs
STRING
Extension: VLM_nodes
Custom Nodes for Vision Language Models (VLM) , Large Language Models (LLM), Image Captioning, Automatic Prompt Generation, Creative and Consistent Prompt Suggestion, Keyword Extraction
Authored by gokayfem
Run ComfyUI workflows in the Cloud!
No downloads or installs are required. Pay only for active GPU usage, not idle time. No complex setups and dependency issues
Learn more