ComfyUI Extension: ComfyUI-VLM-Captions

Authored by 5x00

Created 2 years ago

Updated 2 years ago

3 stars

Run ComfyUI workflows without the setup

No installs, no CUDA version roulette, no GPU sitting idle on your bill. Bring a workflow and run it in the browser.

A simple ComfyUI node that let's you use ChatGPT 4o's VLM capabilities to generate captions/tags for images

Looking for a different extension?

Custom Nodes (1)

Image To Caption

README

ComfyUI-VLM-Captions

A simple ComfyUI node that let's you use Claude or ChatGPT 4o's VLM capabilities to generate captions/tags for images.

Installation

git clone this repository into Comfyui/custom_nodes/

Usage

The node accepts an image and a prompt as inputs to generate captions. The input image is automatically resized to 512 pixels to optimize performance and reduce costs. To generate a caption, provide a prompt such as "Create a concise description for the given image" in the text field. Be sure to replace the placeholder API key with your own to enable functionality.

Workflow Example

Right click > Convert widget to input to convert conditioning text box into a node input

Run ComfyUI workflows without the setup

No installs, no CUDA version roulette, no GPU sitting idle on your bill. Bring a workflow and run it in the browser.

Learn more