ComfyUI Extension: ComfyUI Gemini Nodes
A collection of custom nodes for integrating Google Gemini API with ComfyUI, providing powerful AI capabilities for text generation, image generation, and video analysis. Nodes: Gemini Text API, Gemini Image Editor, Gemini Image Gen Advanced, Gemini Video Captioner.
Custom Nodes (0)
README
ComfyUI Gemini Nodes
A comprehensive collection of custom nodes for integrating Google Gemini API with ComfyUI, providing powerful AI capabilities for text generation, structured output, image generation, and video analysis.
π Table of Contents
- Features
- Installation
- Configuration
- Node Documentation
- Usage Examples
- Troubleshooting
- License
- Credits
Features
Core Capabilities
- Text Generation: Advanced text generation with all Gemini models
- Structured Output: JSON schema-based responses and data extraction
- Image Generation: Multi-reference image generation with batch processing
- Video Analysis: Intelligent video captioning and analysis
- JSON Processing: Extract, parse, and manipulate JSON data
- API Debugging: All processing nodes now output complete API request/response for debugging
Installation
- Clone this repository into your ComfyUI custom nodes directory:
cd ComfyUI/custom_nodes
git clone https://github.com/jqy-yo/comfyui-gemini-nodes.git
- Install required dependencies:
cd comfyui-gemini-nodes
pip install -r requirements.txt
- Restart ComfyUI
Configuration
API Key Setup
Set up your Google Gemini API key in one of these ways:
- Environment Variable (Recommended):
export GOOGLE_API_KEY="your-api-key-here"
- Node Input: Enter directly in the node's API key field
Model Selection
All nodes support flexible model input. Enter any valid Gemini model name:
- Text/Structured:
gemini-2.0-flash
,gemini-1.5-pro
- Image Generation:
imagen-3.0-generate-001
,gemini-2.0-flash-exp
- Video Analysis:
gemini-2.0-flash
,gemini-1.5-pro
Node Documentation
π€ Gemini Text API
Generate text responses using any Gemini model with flexible model selection and advanced generation parameters.
Input Parameters:
prompt
(STRING): The input text promptapi_key
(STRING): Your Google API key (can use environment variable GOOGLE_API_KEY)model
(STRING): Model name (default: "gemini-2.0-flash")- Examples: gemini-2.5-flash-lite, gemini-2.0-flash, gemini-1.5-pro
temperature
(FLOAT, 0.0-1.0, default: 0.7): Controls randomness in generation- 0.0: Most deterministic/focused
- 1.0: Most creative/varied
max_output_tokens
(INT, 64-8192, default: 1024): Maximum response lengthseed
(INT, 0-2147483647, default: 0): Random seed for reproducibilitysystem_instructions
(STRING, optional): System-level instructions to guide the model's behaviortop_p
(FLOAT, 0.0-1.0, default: 0.95): Nucleus sampling thresholdtop_k
(INT, 1-100, default: 64): Top-k sampling parameterapi_version
(ENUM, optional): API version selection (auto/v1/v1beta/v1alpha)
Output:
response
(STRING): Generated text responseapi_request
(STRING): Complete API request sent to Gemini (JSON format)api_response
(STRING): Complete API response from Gemini (JSON format)
Usage Examples:
- Basic Text Generation:
Prompt: "Write a haiku about artificial intelligence"
Model: gemini-2.0-flash
Temperature: 0.8
Output: A creative haiku poem about AI
- Technical Documentation:
Prompt: "Explain the implementation of a binary search tree"
Model: gemini-1.5-pro
Temperature: 0.3
System Instructions: "Provide detailed technical explanations with code examples in Python"
Output: Comprehensive BST explanation with Python code
- Creative Writing:
Prompt: "Continue this story: The door creaked open revealing..."
Model: gemini-2.5-flash-lite
Temperature: 0.9
Max Output Tokens: 2048
Output: Creative story continuation
Best Practices:
- Use lower temperature (0.1-0.3) for factual/technical content
- Use higher temperature (0.7-0.9) for creative content
- Set system instructions for consistent behavior across prompts
- Use seeds for reproducible outputs in production
π Gemini Structured Output
Generate responses in a specific JSON structure using JSON Schema validation.
Input Parameters:
prompt
(STRING): Input prompt describing what to generateapi_key
(STRING): Your Google API keymodel
(STRING, default: "gemini-2.0-flash"): Model nameoutput_mode
(ENUM):json_schema
: Use custom JSON schemaenum
: Generate from predefined options
schema_json
(STRING): JSON Schema definition (for json_schema mode)temperature
(FLOAT, 0.0-1.0, default: 0.7): Generation temperaturemax_output_tokens
(INT, 64-8192, default: 1024): Maximum tokensseed
(INT): Random seed for reproducibilitysystem_instructions
(STRING, optional): System-level instructionsenum_options
(STRING, optional): JSON array of enum options (for enum mode)property_ordering
(STRING, optional): Comma-separated property ordertop_p
(FLOAT, optional): Nucleus sampling parametertop_k
(INT, optional): Top-k sampling parameter
Output:
structured_output
(STRING): Formatted JSON outputraw_json
(STRING): Raw JSON responsedebug_request_sent
(STRING): Complete API request sent to Gemini (JSON format)debug_response_received
(STRING): Complete API response from Gemini (JSON format)
Usage Examples:
- Product Information Schema:
{
"type": "object",
"properties": {
"product_name": {"type": "string"},
"price": {"type": "number"},
"in_stock": {"type": "boolean"},
"categories": {
"type": "array",
"items": {"type": "string"}
},
"specifications": {
"type": "object",
"properties": {
"weight": {"type": "string"},
"dimensions": {"type": "string"}
}
}
},
"required": ["product_name", "price", "in_stock"]
}
- Enum Mode Example:
Output Mode: enum
Enum Options: ["positive", "negative", "neutral"]
Prompt: "Analyze the sentiment of this review: 'Great product!'"
Output: {"selection": "positive"}
- Complex Nested Structure:
{
"type": "object",
"properties": {
"article": {
"type": "object",
"properties": {
"title": {"type": "string"},
"authors": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string"}
}
}
},
"sections": {
"type": "array",
"items": {
"type": "object",
"properties": {
"heading": {"type": "string"},
"content": {"type": "string"}
}
}
}
}
}
}
}
Best Practices:
- Keep schemas focused and not overly complex
- Use clear property names and descriptions
- Test schemas with sample data first
- Use lower temperature (0.1-0.3) for consistent structured output
- Property ordering helps maintain consistent output format
π Gemini JSON Extractor
Extract specific information from text into JSON format using field definitions.
Input Parameters:
prompt
(STRING): Extraction instructions describing what to extractapi_key
(STRING): Your Google API keymodel
(STRING, default: "gemini-2.0-flash"): Model nameextract_fields
(STRING): Field definitions in simple formattemperature
(FLOAT, 0.0-1.0, default: 0.3): Lower values for accurate extractionseed
(INT): Random seed for reproducibilityinput_text
(STRING, optional): Text to extract from (can be combined with prompt)system_instructions
(STRING, optional): Default: "Extract the requested information from the provided text. Be accurate and concise."
Field Definition Format:
field_name: type
field_name2: type[]
field_name3: type?
Supported Types:
string
: Text valuesnumber
/float
: Numeric valuesinteger
/int
: Whole numbersboolean
/bool
: True/false valuesstring[]
: Array of strings?
suffix: Optional field (not required)
Output:
extracted_json
(STRING): Raw JSON outputformatted_output
(STRING): Human-readable format with field labels
Usage Examples:
- Article Metadata Extraction:
Extract Fields:
title: string
author: string
publish_date: string
categories: string[]
word_count: integer
is_featured: boolean
Input Text: "Breaking News: AI Advances in Healthcare by Dr. Smith..."
Output: {
"title": "Breaking News: AI Advances in Healthcare",
"author": "Dr. Smith",
"publish_date": "2024-01-15",
"categories": ["AI", "Healthcare", "Technology"],
"word_count": 1250,
"is_featured": true
}
- Product Review Analysis:
Extract Fields:
product_name: string
rating: number
pros: string[]
cons: string[]
recommended: boolean
price_mentioned: string?
Prompt: "Extract review information"
Input Text: "[Product review text...]"
- Contact Information Parsing:
Extract Fields:
full_name: string
email: string
phone: string?
company: string?
job_title: string?
address: string?
Prompt: "Extract contact details from the business card text"
Best Practices:
- Use low temperature (0.1-0.3) for accurate extraction
- Provide clear field names that describe the data
- Mark optional fields with '?' suffix
- Use arrays (type[]) for multiple values
- Combine with downstream nodes to process extracted data
π Gemini Field Extractor
Extract specific fields from JSON data using path notation. This node does NOT use the Gemini API - it's a pure JSON processing tool.
Input Parameters:
json_input
(STRING): JSON data to extract from (supports raw JSON or markdown-wrapped)field_path
(STRING): Path to the field using dot notationdefault_value
(STRING): Value returned if field not foundoutput_format
(ENUM):auto
: Automatically determine formatstring
: Convert to stringjson
: Format as JSONnumber
: Extract as numberboolean
: Convert to booleanlist
: Format as list/array
array_handling
(ENUM, optional):all
: Return entire arrayfirst
: Return first elementlast
: Return last elementjoin
: Join array elements
join_separator
(STRING, default: ", "): Separator when joining arrays
Path Notation Examples:
- Simple field:
name
,email
,id
- Nested object:
user.profile.age
,settings.theme.color
- Array by index:
items[0]
,users[2].name
- Negative index:
items[-1]
(last item) - All array items:
users[*].email
,products[*].price
- Complex paths:
data.users[0].orders[*].items[0].price
Output:
extracted_value
(STRING): The extracted value in specified formatformatted_output
(STRING): Detailed extraction information with type and structureextraction_success
(BOOLEAN): True if extraction succeeded, False otherwise
Usage Examples:
- Extract User Email:
JSON Input: {"user": {"name": "John", "email": "[email protected]"}}
Field Path: user.email
Output: [email protected]
- Extract All Product Prices:
JSON Input: {
"products": [
{"name": "Item1", "price": 10.99},
{"name": "Item2", "price": 25.50}
]
}
Field Path: products[*].price
Array Handling: join
Join Separator: ", "
Output: "10.99, 25.50"
- Extract Nested Configuration:
JSON Input: {
"config": {
"database": {
"host": "localhost",
"port": 5432,
"credentials": {
"username": "admin"
}
}
}
}
Field Path: config.database.credentials.username
Output: "admin"
- Extract with Default Value:
JSON Input: {"name": "Product"}
Field Path: price
Default Value: "0.00"
Output: "0.00" (field not found, using default)
Best Practices:
- Test path syntax with sample JSON first
- Use meaningful default values
- Choose appropriate array handling for your use case
- Chain with other nodes for complex data processing
- Validate JSON input format before extraction
π οΈ Gemini JSON Parser
Parse and manipulate JSON data. This node does NOT use the Gemini API - it's a pure JSON processing utility.
Input Parameters:
json_input
(STRING): JSON data to process (supports raw JSON or markdown-wrapped)operation
(ENUM): Operation to performvalidate
: Check if JSON is valid and show structure infoformat
: Pretty-print with indentationminify
: Remove unnecessary whitespace for compact storageextract_keys
: Get all key paths in the JSONget_type
: Analyze type structure of the JSONcount_items
: Count elements by type (objects, arrays, strings, etc.)
indent
(INT, 0-8, default: 2): Indentation spaces for formattingsort_keys
(BOOLEAN, default: False): Sort object keys alphabetically
Output:
result
(STRING): Operation resultinfo
(STRING): Detailed operation informationsuccess
(BOOLEAN): True if operation succeeded
Usage Examples:
- Validate JSON:
Operation: validate
Input: {"name": "test", "values": [1, 2, 3]}
Result: β
Valid JSON
Info:
Type: Object
Properties: 2
Keys: name, values
- Format JSON:
Operation: format
Indent: 4
Sort Keys: true
Input: {"b":1,"a":{"nested":true}}
Output:
{
"a": {
"nested": true
},
"b": 1
}
- Minify JSON:
Operation: minify
Input: {
"name": "test",
"value": 123
}
Output: {"name":"test","value":123}
Info: Minified: 45 β 28 bytes (62.2%)
- Extract All Keys:
Operation: extract_keys
Input: {
"user": {
"name": "John",
"settings": {
"theme": "dark"
}
}
}
Output: [
"user",
"user.name",
"user.settings",
"user.settings.theme"
]
- Get Type Structure:
Operation: get_type
Input: {
"users": [{"name": "John", "age": 30}],
"count": 1
}
Output: {
"users": [{"name": "string", "age": "number"}],
"count": "number"
}
- Count Items:
Operation: count_items
Output: {
"total_keys": 5,
"total_values": 3,
"objects": 2,
"arrays": 1,
"strings": 2,
"numbers": 1,
"booleans": 0,
"nulls": 0
}
Best Practices:
- Use validate to check JSON before processing
- Use minify before storing or transmitting JSON
- Use extract_keys to understand complex JSON structure
- Use format with sort_keys for consistent JSON output
- Chain with Field Extractor for targeted data extraction
π¨ Gemini Image Editor
Generate images with optional reference images using Gemini's image generation models.
Input Parameters:
prompt
(STRING): Detailed image generation promptapi_key
(STRING): Your Google API keymodel
(STRING): Model name- Default: "models/gemini-2.0-flash-preview-image-generation"
- Alternatives: "imagen-3.0-generate-001", "gemini-2.5-flash", "models/gemini-2.0-flash-exp"
temperature
(FLOAT, 0.0-2.0, default: 1.0): Generation creativitymax_retries
(INT, 1-5, default: 3): API retry attemptsbatch_size
(INT, 1-8, default: 1): Number of images to generateseed
(INT, optional): Random seed for reproducibilityimage1
toimage4
(IMAGE, optional): Reference images for style/content guidanceapi_version
(ENUM, optional): API version (auto/v1/v1beta/v1alpha)
Output:
image
(IMAGE): Generated images batchAPI Respond
(STRING): API response informationapi_request
(STRING): Complete API request sent to Gemini (JSON format)api_response
(STRING): Complete API response from Gemini (JSON format)
Features:
- Automatic image padding to minimum 1024x1024 with white borders
- Async parallel processing for batch generation
- Built-in retry logic with exponential backoff
- Error image generation on failure (black image with error text)
- Support for up to 4 reference images
Usage Examples:
- Simple Image Generation:
Prompt: "A serene Japanese garden with cherry blossoms"
Model: imagen-3.0-generate-001
Batch Size: 4
Output: 4 variations of Japanese garden images
- Style Transfer with Reference:
Prompt: "Transform this into a watercolor painting style"
Image1: [Original photo]
Temperature: 0.8
Output: Watercolor style version of the input image
- Product Visualization:
Prompt: "Modern minimalist product photography on white background"
Image1: [Product sketch]
Model: gemini-2.0-flash-preview-image-generation
Batch Size: 3
Output: 3 professional product photos based on sketch
- Character Design Variations:
Prompt: "Create variations of this character in different poses"
Image1: [Character reference]
Image2: [Style reference]
Temperature: 1.2
Batch Size: 6
Output: 6 character variations
Best Practices:
- Provide detailed, specific prompts for better results
- Use reference images to guide style and composition
- Lower temperature (0.5-0.8) for consistency
- Higher temperature (1.2-1.5) for more variation
- Images smaller than 1024x1024 are automatically padded
- Use batch generation for multiple variations
- Set seed for reproducible results in production
π Gemini Image Gen Advanced
Advanced multi-slot image generation system for batch processing multiple image/prompt combinations.
Input Parameters:
inputcount
(INT, 1-100, default: 1): Number of input slots to useapi_key
(STRING): Your Google API key (supports env var GEMINI_API_KEY)model
(STRING): Model name- Default: "models/gemini-2.0-flash-preview-image-generation"
- Alternatives: "imagen-3.0-generate-001", "gemini-2.5-flash"
temperature
(FLOAT, 0.0-2.0, default: 1.0): Generation creativitymax_retries
(INT, 1-5, default: 3): API retry attempts per slotprompt_1
(STRING, required): First generation promptimage_1
(IMAGE, optional): First reference imageseed
(INT, optional): Random seed for reproducibilityretry_indefinitely
(BOOLEAN, optional): Keep retrying on failureapi_version
(ENUM, optional): API version selection
Dynamic Inputs (based on inputcount):
input_image_X
(IMAGE): Reference image for slot Xinput_prompt_X
(STRING): Generation prompt for slot X- Where X ranges from 1 to inputcount
Features:
- Asynchronous parallel processing of all slots
- Progress bar with real-time updates
- Automatic image padding to 1024x1024 minimum
- Batch result aggregation
- Error handling with fallback images
- Memory-efficient processing
Output:
images
(IMAGE): All generated images in a single batchAPI_responses
(STRING): Detailed API response informationapi_request
(STRING): Complete API requests sent to Gemini (JSON format)api_response
(STRING): Complete API responses from Gemini (JSON format)
Usage Examples:
- Batch Product Variations:
Input Count: 5
Slot 1: "Red leather handbag" + [reference image]
Slot 2: "Blue leather handbag" + [reference image]
Slot 3: "Black leather handbag" + [reference image]
Slot 4: "Brown leather handbag" + [reference image]
Slot 5: "White leather handbag" + [reference image]
Output: 5 handbag variations in different colors
- Scene Variations:
Input Count: 10
Model: imagen-3.0-generate-001
Slot 1-10: Different time-of-day prompts for same scene
- "Sunrise over the mountains"
- "Morning light on the mountains"
- "Noon sunshine on the mountains"
- "Golden hour mountain view"
- "Sunset behind the mountains"
- etc.
Output: 10 images showing time progression
- Style Exploration:
Input Count: 8
Base Image: [Portrait photo]
Slot 1: "Oil painting style"
Slot 2: "Watercolor style"
Slot 3: "Pencil sketch style"
Slot 4: "Digital art style"
Slot 5: "Anime style"
Slot 6: "Pop art style"
Slot 7: "Impressionist style"
Slot 8: "Photorealistic style"
Output: 8 different artistic interpretations
- Marketing Asset Generation:
Input Count: 20
Batch processing for social media content:
- Different angles of products
- Various backgrounds
- Multiple lighting conditions
- Different compositions
Output: 20 marketing-ready images
Best Practices:
- Plan slot allocation for systematic variations
- Use consistent seed across slots for controlled variation
- Monitor progress bar for large batches
- Group similar prompts for better GPU utilization
- Use retry_indefinitely cautiously (may increase costs)
- Consider memory limits when setting high inputcount
- Batch related images together for workflow efficiency
π₯ Gemini Video Generator (Veo)
Generate high-quality videos using Google's Veo models with text-to-video and image-to-video capabilities.
Input Parameters:
prompt
(STRING): Detailed description of the video to generate- Default: "A cinematic drone shot of a red convertible driving along a coastal road at sunset"
api_key
(STRING): Your Google API key (requires Veo access)model
(ENUM): Video generation modelveo-3.0-generate-preview
: Highest quality Veo 3 modelveo-3.0-fast-generate-preview
: Faster Veo 3 variantveo-2.0-generate-001
: Veo 2 model
aspect_ratio
(ENUM): Video dimensions- Options: "16:9", "9:16", "1:1", "4:3", "3:4"
- Default: "16:9"
person_generation
(ENUM): Control human representationdefault
: Standard behaviorallow
: Explicitly allow person generationdont_allow
: Prevent person generation
max_wait_minutes
(FLOAT, 1.0-30.0, default: 5.0): Maximum time to wait for generationpoll_interval_seconds
(INT, 2-30, default: 5): How often to check generation statusnegative_prompt
(STRING, optional): Elements to exclude from the videoinitial_image
(IMAGE, optional): Starting image for video generationsave_path
(STRING, optional): Path to save the generated video
Output:
video_path
(STRING): Path to the generated video filepreview_frame
(IMAGE): A preview frame extracted from the videogeneration_info
(STRING): Details about the generated videoapi_request
(STRING): Complete API request sent to Gemini (JSON format)api_response
(STRING): Complete API response from Gemini (JSON format)
Features:
- Text-to-video generation with detailed prompts
- Image-to-video generation from initial frames
- Multiple aspect ratios for different platforms
- Negative prompting to exclude unwanted elements
- Asynchronous generation with progress tracking
- Automatic video download and storage
- Preview frame extraction for ComfyUI display
- SynthID watermarking (automatic)
Video Specifications:
- Duration: 8 seconds
- Resolution: 720p (1280x720 or equivalent based on aspect ratio)
- Format: MP4 with native audio
- Storage: Videos stored for 2 days on Google servers
- Watermark: Includes SynthID for AI-generated content identification
Usage Examples:
- Cinematic Scene Generation:
Prompt: "A sweeping aerial view of a misty mountain range at dawn, clouds rolling through valleys, golden sunlight breaking through"
Model: veo-3.0-generate-preview
Aspect Ratio: 16:9
Output: 8-second cinematic landscape video
- Product Showcase:
Prompt: "360-degree rotation of a luxury watch on a black velvet turntable, dramatic lighting highlighting the metallic finish"
Negative Prompt: "blurry, low quality, distorted"
Model: veo-3.0-generate-preview
Aspect Ratio: 1:1
Output: Professional product video
- Image-to-Video Animation:
Prompt: "Bring this portrait to life with subtle movements, blinking, and breathing"
Initial Image: [Portrait photo]
Model: veo-3.0-fast-generate-preview
Person Generation: allow
Output: Animated portrait video
- Social Media Content:
Prompt: "Time-lapse of a busy coffee shop, people coming and going, steam rising from cups, warm cozy atmosphere"
Model: veo-2.0-generate-001
Aspect Ratio: 9:16
Output: Vertical video for social media
Best Practices:
- Use detailed, specific prompts for better results
- Include style keywords (cinematic, documentary, anime, etc.)
- Use negative prompts to avoid unwanted elements
- Consider aspect ratio based on target platform
- Allow sufficient generation time (typically 2-5 minutes)
- Save videos locally as they're only stored for 2 days
Important Notes:
- Requires API access to Veo models (may need special permissions)
- Generation is asynchronous and may take several minutes
- Videos include SynthID watermark for transparency
- Person generation may be restricted in certain regions
- Maximum prompt length: 1,024 tokens
π¬ Gemini Video Captioner
Generate intelligent captions and descriptions for videos using Gemini's multimodal capabilities.
Input Parameters:
api_key
(STRING): Your Google API keymodel
(STRING): Model name- Default: "gemini-2.0-flash"
- Alternatives: "gemini-2.5-flash-lite", "gemini-1.5-pro"
frames_per_second
(FLOAT, 0.1-10.0, default: 1.0): Frame sampling ratemax_duration_minutes
(FLOAT, 0.1-45.0, default: 2.0): Maximum video duration to processprompt
(STRING): Analysis instructions- Default: "Describe this video scene in detail. Include any important actions, subjects, settings, and atmosphere."
process_audio
(ENUM ["false", "true"], default: "false"): Include audio analysistemperature
(FLOAT, 0.0-1.0, default: 0.7): Generation temperaturemax_output_tokens
(INT, 50-8192, default: 1024): Maximum caption lengthtop_p
(FLOAT, 0.0-1.0, default: 0.95): Nucleus samplingtop_k
(INT, 1-100, default: 64): Top-k samplingseed
(INT): Random seed for reproducibilityvideo_path
(STRING, optional): Path to video fileimage
(IMAGE, optional): Image sequence/batch as video framesapi_version
(ENUM, optional): API version selection
Features:
- Automatic video format conversion to WebM
- Intelligent frame sampling based on FPS setting
- File size optimization (under 30MB limit)
- Support for both video files and image sequences
- Audio processing capability (model-dependent)
- Frame extraction with timestamps
- Progress tracking for long videos
Output:
caption
(STRING): Generated video description/captionsampled_frame
(IMAGE): Representative frame from the videoraw_json
(STRING): Raw JSON response (for structured output mode)api_request
(STRING): Complete API request sent to Gemini (JSON format)api_response
(STRING): Complete API response from Gemini (JSON format)
Video Processing Details:
- Maximum file size: 30MB (automatically compressed if needed)
- Supported formats: MP4, AVI, MOV, WebM, WMV, FLV, MPG, MPEG
- Frame sampling: Based on frames_per_second parameter
- Duration limits: 2 minutes (Gemini 1.0), 45 minutes (Gemini 1.5+)
Usage Examples:
- Basic Video Description:
Video Path: /path/to/video.mp4
Model: gemini-2.0-flash
Prompt: "Describe what happens in this video"
FPS: 1.0
Output: Detailed scene-by-scene description
- Technical Analysis:
Prompt: "Analyze the cinematography techniques, camera movements, and visual composition"
FPS: 2.0
Max Duration: 5.0
Temperature: 0.3
Output: Technical cinematography analysis
- Action Detection:
Prompt: "List all actions performed by people in this video with timestamps"
FPS: 5.0
Process Audio: true
Output: Timestamped action list with audio cues
- Content Moderation:
Prompt: "Identify any potentially inappropriate content, violence, or safety concerns"
Model: gemini-1.5-pro
FPS: 3.0
Temperature: 0.1
Output: Safety analysis report
- Tutorial Summarization:
Prompt: "Create a step-by-step summary of this tutorial video"
Max Duration: 10.0
Max Output Tokens: 2048
Output: Structured tutorial steps
- Sports Analysis:
Prompt: "Analyze the gameplay, tactics, and key moments in this sports clip"
FPS: 10.0
Process Audio: true
Output: Detailed sports commentary
Best Practices:
- Use lower FPS (0.5-1.0) for slow-paced content
- Use higher FPS (5.0-10.0) for action sequences
- Enable audio processing for videos with important sound
- Set appropriate max_duration based on video length
- Use lower temperature for factual descriptions
- Use higher temperature for creative interpretations
- For long videos, consider splitting into segments
- Test with different prompts for specific analysis needs
Usage Examples
Example 1: Extract Product Information
1. GeminiStructuredOutput β Define product schema
2. Input: Product description text
3. Output: Structured JSON with name, price, features
4. GeminiFieldExtractor β Extract just the price
5. Use price in downstream nodes
Example 2: Batch Image Generation with Data
1. GeminiJSONExtractor β Extract image descriptions from text
2. GeminiFieldExtractor β Get descriptions array
3. GeminiImageGenAdvanced β Generate images for each description
4. Output: Batch of generated images
Example 3: Video Analysis Pipeline
1. Load Video β GeminiVideoCaptioner
2. Caption β GeminiJSONExtractor (extract key moments)
3. Key moments β GeminiStructuredOutput (timeline format)
4. Output: Structured video timeline
Example 4: Complex Data Processing
1. API Response β GeminiJSONParser (validate)
2. If valid β GeminiFieldExtractor (extract nested data)
3. Extracted data β GeminiStructuredOutput (reformat)
4. Output: Cleaned, restructured data
Best Practices
For Structured Output
- Keep schemas simple and focused
- Use required fields sparingly
- Test schemas with sample data first
- Provide clear property descriptions
For Field Extraction
- Use specific paths to avoid ambiguity
- Set meaningful default values
- Test path syntax with sample JSON
- Use array handling options appropriately
For Image Generation
- Provide detailed, specific prompts
- Use reference images when possible
- Keep batch sizes reasonable for memory
- Use negative prompts to improve quality
For Video Analysis
- Keep videos under 30MB
- Use clear, specific analysis prompts
- Consider frame sampling for long videos
- Combine with structured output for data extraction
Troubleshooting
API Debugging
All processing nodes now include api_request
and api_response
outputs for debugging:
- api_request: Complete request sent to Gemini API (JSON format)
- api_response: Complete response from Gemini API (JSON format)
Use these outputs to:
- Debug API communication issues
- Understand exact request format
- Analyze response structure
- Track token usage and costs
- Identify rate limiting issues
Common Issues
API Key Errors:
- Verify key is valid and active
- Check key has required permissions
- Ensure no extra spaces in key
Model Access Issues:
- Some models require specific access levels
- Check regional availability
- Verify model name spelling
JSON/Schema Errors:
- Validate JSON syntax before input
- Check schema follows JSON Schema spec
- Use simpler schemas if errors persist
Memory Issues:
- Reduce batch sizes
- Lower image resolutions
- Process videos in segments
Field Extraction Issues:
- Verify JSON is valid
- Check field path syntax
- Test with simpler paths first
Requirements
- ComfyUI
- Python 3.8+
- google-genai
- Pillow (PIL)
- numpy
- torch
- opencv-python
- moviepy
- aiohttp
License
This project is licensed under the MIT License - see the LICENSE file for details.
Credits
This project is based on code from ComfyUI_Fill-Nodes by filliptm. Special thanks for the original implementation.
Enhancements include:
- Structured output support
- JSON processing capabilities
- Field extraction system
- Flexible model selection
- Improved error handling
Acknowledgments
- Original code by filliptm
- Based on ComfyUI_Fill-Nodes
- Maintained by jqy-yo
Support
For issues, questions, or contributions, please open an issue on GitHub.