A simple ComfyUI node that let's you use Claude or ChatGPT 4o's VLM capabilities to generate captions/tags for images.
A simple ComfyUI node that let's you use Claude or ChatGPT 4o's VLM capabilities to generate captions/tags for images.
The node accepts an image and a prompt as inputs to generate captions. The input image is automatically resized to 512 pixels to optimize performance and reduce costs. To generate a caption, provide a prompt such as "Create a concise description for the given image" in the text field. Be sure to replace the placeholder API key with your own to enable functionality.
Right click > Convert widget to input to convert conditioning text box into a node input