ComfyUI Extension: ComfyUI-JM-Gemini-API
A custom node for ComfyUI that generates images using Google's Gemini API, supporting both text-to-image and image-to-image generation.
Custom Nodes (0)
README
ComfyUI-JM-Gemini-API
Custom nodes for ComfyUI that generate images and videos using Google's Gemini API, supporting text-to-image, image-to-image, text-to-video, and image-to-video generation.
English | 简体中文
Features
Image Generation
- Support for multiple Gemini image models:
gemini-3-pro-image-preview(default, with 2K resolution)gemini-2.5-flash-image
- Text-to-image generation
- Image-to-image generation (single or multiple input images)
- Image editing mode (single image input)
- Configurable aspect ratios (1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9)
- Resolution control (1K, 2K, 4K) - only for gemini-3-pro-image-preview
- Support up to 10 input images
- Automatic image saving to ComfyUI output directory
Video Generation
- Support for Gemini Veo video models:
veo-3.1-generate-preview(default)veo-3.1-fast-generate-previewveo-3.0-generate-001veo-3.0-fast-generate-001
- Text-to-video generation
- Image-to-video generation (animate a single image)
- First-last frame interpolation (Veo 3.1 only)
- Negative prompt support
- Configurable aspect ratios (16:9, 9:16)
- Resolution control (720p, 1080p)
- Duration control (4, 6, 8 seconds)
- Automatic video saving to ComfyUI output directory
Important Limitations:
- 1080p resolution: Only supports 8-second duration for Veo 3.1 models
- First-last frame interpolation: Only available for Veo 3.1 models with 8-second duration
Installation
- Clone or download this repository to your ComfyUI custom_nodes directory:
cd custom_nodes
git clone https://github.com/synthetai/ComfyUI-JM-Gemini-API.git
- Install required dependencies:
cd ComfyUI-JM-Gemini-API
pip install -r requirements.txt
- Restart ComfyUI
Requirements
- Python 3.8+
- ComfyUI
- google-genai >= 0.2.0
- Pillow >= 10.0.0
- torch
- torchvision
- numpy
- httpx[socks] (for proxy support)
Proxy Configuration (Optional)
If you need to use a proxy to access the Gemini API (common for users in China), you can set environment variables:
HTTP/HTTPS Proxy
export HTTP_PROXY="http://your-proxy:port"
export HTTPS_PROXY="http://your-proxy:port"
SOCKS5 Proxy
export HTTP_PROXY="socks5://your-proxy:port"
export HTTPS_PROXY="socks5://your-proxy:port"
Or set them before starting ComfyUI:
HTTP_PROXY="socks5://127.0.0.1:1080" HTTPS_PROXY="socks5://127.0.0.1:1080" python main.py
Usage
Getting Gemini API Key
- Visit Google AI Studio
- Sign in with your Google account
- Create a new API key
- Copy the API key for use in the node
Node Parameters
Required Inputs:
- gemini_api_key: Your Gemini API key (string)
- prompt: Text prompt describing the image you want to generate (multiline text)
- model: Choose between:
gemini-3-pro-image-preview(default, supports 1K/2K/4K resolution)gemini-2.5-flash-image(faster, aspect ratio only)
- aspect_ratio: Image aspect ratio (1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9)
- resolution: Image resolution (1K, 2K, 4K)
- Note: Only effective for
gemini-3-pro-image-previewmodel - Default: 2K
- Note: Only effective for
Optional Inputs:
- image1 ~ image10: Up to 10 optional image inputs for image-to-image generation
- Connect output from Load Image node
- Can be left empty for text-to-image generation
Outputs:
- image: Generated image (ComfyUI IMAGE tensor format)
- Can be connected to Preview Image or Save Image nodes
- Automatically saved to ComfyUI output directory
Usage Examples
1. Text-to-Image Generation
- Add "JM Gemini Image Generator" node to your workflow
- Enter your Gemini API key
- Write your prompt
- Select model and aspect ratio
- Leave all image inputs empty
- Connect output to Preview Image node
- Run the workflow
2. Image Editing (Single Image)
- Add Load Image node and load your image
- Add "JM Gemini Image Generator" node
- Connect Load Image output to image1 input
- Enter editing instructions in prompt (e.g., "Add a sunset background")
- Configure model and parameters
- Run the workflow
3. Image-to-Image (Multiple Images)
- Add multiple Load Image nodes
- Connect them to image1, image2, etc.
- Enter a prompt describing how to combine/transform the images
- Run the workflow
Model Differences
gemini-3-pro-image-preview
- Supports resolution parameter (1K, 2K, 4K)
- Default resolution: 2K
- Supports single image editing
- Higher quality output
gemini-2.5-flash-image
- Faster generation
- Uses aspect ratio only (no resolution parameter)
- Good for quick iterations
Troubleshooting
Common Issues
-
"Gemini API key is required"
- Make sure you've entered a valid API key
-
"No images were generated"
- Check your prompt is clear and descriptive
- Try a different aspect ratio or resolution
- Verify your API key is valid and has sufficient quota
-
"Failed to generate image"
- Check your internet connection
- Verify API key permissions
- Check Gemini API service status
-
Image quality issues
- For gemini-3-pro-image-preview, try using different resolutions (1K, 2K, or 4K)
- Make your prompt more detailed and specific
Output Directory
Generated images are automatically saved to:
ComfyUI/output/directory- Filename format:
{model}_{mode}_{timestamp}.png- Example:
gemini3pro_text2img_1234567890.png
- Example:
License
MIT License
Credits
Developed by JM
Based on Google's Gemini API
Support
For issues and feature requests, please visit the GitHub repository
Video Node Usage
Node: JM Gemini Video Generator
Required Inputs:
- gemini_api_key: Your Gemini API key (string, encrypted input)
- prompt: Text prompt describing the video you want to generate (multiline text)
Optional Inputs:
- negative_prompt: Describe what you don't want in the video (multiline text)
- model: Choose video model:
veo-3.1-generate-preview(default, highest quality)veo-3.1-fast-generate-preview(faster generation)veo-3.0-generate-001(stable version)veo-3.0-fast-generate-001(stable + fast)
- aspect_ratio: Video aspect ratio (16:9 or 9:16, default: 16:9)
- resolution: Video resolution (720p or 1080p, default: 720p)
- duration: Video duration in seconds (4, 6, or 8, default: 8)
- first_image: First frame image (optional, for image-to-video or interpolation)
- last_image: Last frame image (optional, only for Veo 3.1 interpolation)
Output:
- video_path: Path to the generated video file (STRING)
Video Generation Modes
1. Text-to-Video
- Add "JM Gemini Video Generator" node
- Enter API key and prompt
- Leave both image inputs empty
- Configure model and parameters
- Run workflow - video will be saved to output directory
2. Image-to-Video
- Add Load Image node and load your image
- Connect it to first_image input
- Enter prompt describing the motion/animation
- Run workflow
3. First-Last Frame Interpolation (Veo 3.1 only)
- Add two Load Image nodes
- Connect first image to first_image input
- Connect last image to last_image input
- Select a Veo 3.1 model (veo-3.1-generate-preview or veo-3.1-fast-generate-preview)
- Set duration to 8 seconds (required for interpolation mode)
- Enter prompt describing the transition
- Run workflow - the model will generate smooth interpolation between frames
Important Notes:
- Video generation can take several minutes. The node will poll the API every 10 seconds with a 20-minute timeout.
- 1080p resolution is only supported with 8-second duration for Veo 3.1 models
- First-last frame interpolation requires Veo 3.1 models and 8-second duration only
Changelog
Version 1.1.0
- Added video generation support with JM Gemini Video Generator node
- Support for Veo 3.1 and Veo 3.0 models
- Text-to-video generation
- Image-to-video animation
- First-last frame interpolation (Veo 3.1 only)
- Refactored code structure with separate node files and shared utilities
Version 1.0.0
- Initial release
- Support for gemini-3-pro-image-preview and gemini-2.5-flash-image
- Text-to-image generation
- Image-to-image generation (up to 10 images)
- Configurable aspect ratios and resolutions