ComfyUI Node: Molmo Vision-Language Model

Authored by gokayfem

Created

Updated

481 stars

Category

VLM Nodes/Molmo

Inputs

image IMAGE
prompt STRING
model_name
  • MolmoE-1B (Efficient)
  • Molmo-7B-D (Best 7B)
  • Molmo-7B-O (Alternative 7B)
memory_mode
  • Full Precision (45GB+ Required)
  • 8-bit Quantized (25GB+ Required)
  • 4-bit Quantized (15GB+ Required)
  • 4-bit + CPU Offload (12GB+ Required)
max_new_tokens INT
temperature FLOAT
top_p FLOAT
top_k INT
use_autocast BOOLEAN

Outputs

STRING

Extension: VLM_nodes

Custom Nodes for Vision Language Models (VLM) , Large Language Models (LLM), Image Captioning, Automatic Prompt Generation, Creative and Consistent Prompt Suggestion, Keyword Extraction

Authored by gokayfem

Run ComfyUI workflows in the Cloud!

No downloads or installs are required. Pay only for active GPU usage, not idle time. No complex setups and dependency issues

Learn more