ComfyUI Extension: AS_GeminiCaptioning Node

Authored by svetozarov

Created 9 months ago

Updated 8 months ago

1 stars

A ComfyUI node that combines an image with simple text parameters to create a prompt, sends it to the Google Gemini API via the google-generativeai SDK, and returns the generated text response along with the original prompt and an execution log

Custom Nodes (1)

AS_GeminiCaptioning

README

AS_LLM_nodes

This ComfyUI extension provides custom nodes for working with Google Gemini and OpenAI ChatGPT.

AS_GeminiCaptioning

User Guide

The AS_GeminiCaptioning node lets you generate a descriptive text prompt from a single image using the Google Gemini API. Supply your image and adjust any optional text fields to tailor the output.

Inputs

IMAGE (Required)
The image you want to describe (e.g., JPEG, PNG). Connect your image input.
PROMPT TYPE (Required)
Choose between the preset styles "SD1.5 – SDXL" or "FLUX" to select the base style of the prompt.
If you do not provide a custom reference, the selected style determines which default text is used.
APY KEY PATH (Required)
The file path to your API key for the Google Gemini API.
GEMINI MODEL (Required)
The model that will process your request. Possible options include:
- Gemini 2.0 Flash
- Gemini 2.0 Flash-Lite
- Gemini 1.5 Flash
- Gemini 1.5 Pro
  The default is Gemini 2.0 Flash.
PROMPT LENGTH (Optional)
An approximate word count for the final prompt. If empty, there is no length restriction.
PROMPT REFERENCE (Optional)
A sample text prompt format that serves as a reference. If empty, a default reference is used.
PROMPT STRUCTURE (Optional)
A guideline for organizing details in the prompt (e.g., building type, materials, location).
IGNORE (Optional)
Specific words or concepts to exclude from the prompt.
EMPHASIS (Optional)
Words or concepts to emphasize.
SAVE TO PATH (Optional)
A directory path for saving the generated text file.
TXT NAME (Optional)
Name for the .txt file. If empty and a path is provided, defaults to result.txt.

Outputs

RESULT PROMPT
The text response from Gemini.
REQUEST TEXT
The exact text payload sent to the API.
LOG
A log of execution steps and errors.

AS_MultimodalGemini

User Guide

The AS_MultimodalGemini node sends text plus up to three images to the Google Gemini API.

Inputs

TEXT_INPUT (Required)
A text string to be sent along with the images.
API_KEY_PATH (Required)
The file path to your Gemini API key.
GEMINI MODEL (Required)
- Gemini 2.0 Flash
- Gemini 2.0 Flash-Lite
- Gemini 1.5 Flash
- Gemini 1.5 Pro
  Default is Gemini 2.0 Flash.
IMAGE_1, IMAGE_2, IMAGE_3 (Optional)
Up to three images to attach.

Outputs

RESULT
The text returned by Gemini.
LOG
A log of steps and errors.

AS_ComfyGPT

User Guide

The AS_ComfyGPT node integrates with OpenAI’s ChatGPT. Provide the path to your API key, choose a model, and enter a prompt. The node returns ChatGPT’s reply.

Inputs

api_key_file (Required)
Path to a file containing your OpenAI API key.
model (Required)
Name of the OpenAI model (e.g., "gpt-4", "gpt-3.5-turbo").
prompt (Required)
The user prompt text to send to the GPT model.

Outputs

STRING
The response from ChatGPT.

Required Libraries

Pillow
requests
google-generativeai
openai