ComfyUI Extension: AS_GeminiCaptioning Node
A ComfyUI node that combines an image with simple text parameters to create a prompt, sends it to the Google Gemini API via the google-generativeai SDK, and returns the generated text response along with the original prompt and an execution log
Custom Nodes (1)
README
AS_LLM_nodes
This ComfyUI extension provides custom nodes for working with Google Gemini and OpenAI ChatGPT.
AS_GeminiCaptioning
User Guide
The AS_GeminiCaptioning node lets you generate a descriptive text prompt from a single image using the Google Gemini API. Supply your image and adjust any optional text fields to tailor the output.
Inputs
-
IMAGE (Required)
The image you want to describe (e.g., JPEG, PNG). Connect your image input. -
PROMPT TYPE (Required)
Choose between the preset styles "SD1.5 – SDXL" or "FLUX" to select the base style of the prompt.
If you do not provide a custom reference, the selected style determines which default text is used. -
APY KEY PATH (Required)
The file path to your API key for the Google Gemini API. -
GEMINI MODEL (Required)
The model that will process your request. Possible options include:- Gemini 2.0 Flash
- Gemini 2.0 Flash-Lite
- Gemini 1.5 Flash
- Gemini 1.5 Pro
The default is Gemini 2.0 Flash.
-
PROMPT LENGTH (Optional)
An approximate word count for the final prompt. If empty, there is no length restriction. -
PROMPT REFERENCE (Optional)
A sample text prompt format that serves as a reference. If empty, a default reference is used. -
PROMPT STRUCTURE (Optional)
A guideline for organizing details in the prompt (e.g., building type, materials, location). -
IGNORE (Optional)
Specific words or concepts to exclude from the prompt. -
EMPHASIS (Optional)
Words or concepts to emphasize. -
SAVE TO PATH (Optional)
A directory path for saving the generated text file. -
TXT NAME (Optional)
Name for the.txt
file. If empty and a path is provided, defaults toresult.txt
.
Outputs
-
RESULT PROMPT
The text response from Gemini. -
REQUEST TEXT
The exact text payload sent to the API. -
LOG
A log of execution steps and errors.
AS_MultimodalGemini
User Guide
The AS_MultimodalGemini node sends text plus up to three images to the Google Gemini API.
Inputs
-
TEXT_INPUT (Required)
A text string to be sent along with the images. -
API_KEY_PATH (Required)
The file path to your Gemini API key. -
GEMINI MODEL (Required)
- Gemini 2.0 Flash
- Gemini 2.0 Flash-Lite
- Gemini 1.5 Flash
- Gemini 1.5 Pro
Default is Gemini 2.0 Flash.
-
IMAGE_1, IMAGE_2, IMAGE_3 (Optional)
Up to three images to attach.
Outputs
-
RESULT
The text returned by Gemini. -
LOG
A log of steps and errors.
AS_ComfyGPT
User Guide
The AS_ComfyGPT node integrates with OpenAI’s ChatGPT. Provide the path to your API key, choose a model, and enter a prompt. The node returns ChatGPT’s reply.
Inputs
-
api_key_file (Required)
Path to a file containing your OpenAI API key. -
model (Required)
Name of the OpenAI model (e.g., "gpt-4", "gpt-3.5-turbo"). -
prompt (Required)
The user prompt text to send to the GPT model.
Outputs
- STRING
The response from ChatGPT.
Required Libraries
- Pillow
- requests
- google-generativeai
- openai