ComfyUI Extension: ComfyUI-VLM_Captions

Authored by 5x00

Created

Updated

4 stars

A simple ComfyUI node that let's you use Claude or ChatGPT 4o's VLM capabilities to generate captions/tags for images.

Custom Nodes (0)

    README

    ComfyUI-VLM-Captions

    A simple ComfyUI node that let's you use Claude or ChatGPT 4o's VLM capabilities to generate captions/tags for images.

    Installation

    • git clone this repository into Comfyui/custom_nodes/

    Usage

    The node accepts an image and a prompt as inputs to generate captions. The input image is automatically resized to 512 pixels to optimize performance and reduce costs. To generate a caption, provide a prompt such as "Create a concise description for the given image" in the text field. Be sure to replace the placeholder API key with your own to enable functionality.

    Workflow Example

    image Right click > Convert widget to input to convert conditioning text box into a node input