ComfyUI Extension: ComfyUI-VLM_Captions

Authored by 5x00

Created

Updated

3 stars

A simple ComfyUI node that let's you use Claude or ChatGPT 4o's VLM capabilities to generate captions/tags for images.

Custom Nodes (1)

README

ComfyUI-VLM-Captions

A simple ComfyUI node that let's you use Claude or ChatGPT 4o's VLM capabilities to generate captions/tags for images.

Installation

  • git clone this repository into Comfyui/custom_nodes/

Usage

The node accepts an image and a prompt as inputs to generate captions. The input image is automatically resized to 512 pixels to optimize performance and reduce costs. To generate a caption, provide a prompt such as "Create a concise description for the given image" in the text field. Be sure to replace the placeholder API key with your own to enable functionality.

Workflow Example

image Right click > Convert widget to input to convert conditioning text box into a node input