ComfyUI Extension: ComfyUI-ExternalAPI-Helpers
ComfyUI node for Flux Kontext Pro and Max models from Replicate
Custom Nodes (0)
README
ComfyUI-ExternalAPI-Helpers
A collection of powerful custom nodes for ComfyUI that connect your local workflows to closed-source AI models via their APIs. Use Google's Gemini, Imagen, Veo, OpenAI's GPT-Image-1, and Black Forest Labs' FLUX models directly within ComfyUI.
Key Features
- FLUX Kontext Pro & Max: Image-to-image transformations using the FLUX models via the Replicate API.
- Flux.2 (Replicate): Generate images using the latest FLUX.2 models (Pro, Max, Dev) via Replicate.
- Gemini Chat: Google's powerful multimodal AI. Ask questions about an image, generate detailed descriptions or create prompts for other models. Supports thinking budget controls for applicable models. Now supports audio input.
- Gemini Segmentation: Generate segmentation masks for objects in an image using Gemini.
- Gemini Speaker Diarization: Separate audio into different speaker tracks using Gemini.
- GPT Image Edit: OpenAI's
gpt-image-1for prompt-based image editing and inpainting. Simply mask an area and describe the change you want to see. - OpenAI LLM: Access OpenAI's powerful language models (GPT-4, GPT-5, o1, etc.) for text generation and reasoning.
- OpenAI Text-to-Speech: Generate high-quality speech using OpenAI's TTS models.
- Google Imagen Generator & Edit: Create and edit images with Google's Imagen models, with support for Vertex AI.
- Nano Banana: A creative image generation node using a specialized Gemini model.
- Veo Video Generator: Generate high-quality video clips from text prompts using Google's Veo model via Vertex AI or the Gemini API.
- ElevenLabs TTS: Generate high-quality speech from text using ElevenLabs' diverse range of voices and models.
- Gemini TTS: Create speech from text using Google's Gemini models.
🚀 Installation
-
Navigate to your ComfyUI installation directory.
-
Go into the
custom_nodesfolder:cd ComfyUI/custom_nodes/ -
Clone this repository:
git clone https://github.com/Aryan185/ComfyUI-ExternalAPI-Helpers.git -
Install the required Python packages. Navigate into the newly cloned directory and use pip to install the dependencies:
cd ComfyUI-ExternalAPI-Helpers pip install -r requirements.txt -
Restart ComfyUI. After restarting, you should find the new nodes in the "Add Node" menu.
🔑 Prerequisites: API Keys
All nodes in this collection require API keys to function.
- FLUX Nodes (Replicate): You will need a Replicate API Token.
- Gemini, Imagen, Nano Banana, Gemini TTS, Gemini Diarization, and Veo (Gemini API) Nodes: You will need a Google AI Studio API Key.
- OpenAI Nodes (GPT Image Edit, OpenAI LLM, OpenAI TTS): You will need an OpenAI API Key.
- ElevenLabs TTS Node: You will need an ElevenLabs API Key.
- Vertex AI Nodes (Imagen Edit, Veo Vertex AI): You will need a Google Cloud Project ID, a service account with appropriate permissions, and the location for the resources.
You can paste your key directly into the api_key field on the corresponding node. For Vertex AI nodes, you will need to provide the project ID, location, and path to your service account JSON file.
📚 Node Guide
Flux Kontext Pro / Max
These nodes allow you to transform an input image based on a text prompt. They are ideal for applying artistic styles or making significant conceptual changes to an existing image.
- Category:
image/edit - Inputs:
image: The source image to transform.prompt: A text description of the desired output (e.g., "A vibrant Van Gogh painting", "Make this a 90s cartoon").replicate_api_token: Your API token from Replicate.aspect_ratio: The desired output aspect ratio.match_input_imageis highly recommended to preserve the original composition.output_format:jpgorpng.safety_tolerance: Adjust the content safety filter level.
- Output:
image: The generated image.
Flux.2 (Replicate)
Generate images using Black Forest Labs' FLUX.2 models via the Replicate API.
- Category:
image/generation - Inputs:
prompt: The text prompt for image generation.api_key: Your Replicate API token.model: Choose betweenflux-2-max,flux-2-pro, orflux-2-dev.aspect_ratio: The desired aspect ratio for the generated image.output_format:webp,jpg, orpng.output_quality: Quality of the output image (0-100).image_1toimage_5(Optional): Input images for image-to-image or control tasks.
- Output:
image: The generated image.
Gemini Chat
A versatile node for text generation and image/audio analysis. Use it to understand an image's content, analyze audio, or to generate creative text for other nodes.
- Category:
text/generation - Inputs:
prompt: The text prompt or question you want to ask the model.model: The Gemini model to use (e.g.,gemini-2.5-pro,gemini-2.5-flash).temperature: Controls the creativity of the output.thinking: Enables the model's thinking/reasoning process.seed: Seed for reproducibility.api_key: Your API key from Google AI Studio.system_instruction(Optional): Provide context or rules for how the model should behave.thinking_budget(Optional): Token budget for thinking.image(Optional): An input image for the model to analyze.audio(Optional): An input audio for the model to analyze.
- Output:
response: The text generated by the Gemini model.
Gemini Segmentation
This node uses a Gemini model to generate segmentation masks for specified objects within an image.
- Category:
image/generation - Inputs:
image: The source image for segmentation.segment_prompt: A text description of the objects to segment (e.g., "the car", "all people").model: The Gemini model to use.temperature: Controls randomness.thinking: Enable thinking process.seed: Seed for reproducibility.api_key: Your API key from Google AI Studio.thinking_budget(Optional): Token budget for thinking.
- Output:
mask: A black and white mask of the segmented objects.
Gemini Speaker Diarization
Separate audio into different speaker tracks using Gemini.
- Category:
audio/diarise - Inputs:
audio: The input audio to process.num_speakers: The expected number of speakers.model: The Gemini model to use.api_key: Your API key from Google AI Studio.seed: Seed for reproducibility.temperature: Controls randomness.thinking(Optional): Enable thinking process.thinking_budget(Optional): Token budget for thinking.
- Output:
speaker_1tospeaker_4: Audio tracks for up to 4 separated speakers.
GPT Image Edit
This node uses OpenAI's API to perform powerful, prompt-based inpainting and editing.
- Category:
image/edit - Inputs:
image: The source image to edit.mask(Optional): A black and white mask. The model will edit the white area of the mask.prompt: A description of the edit to perform.api_key: Your API key from OpenAI....other_params: Various quality and formatting options for the OpenAI API.
- Output:
image: The edited image.
OpenAI LLM
Access OpenAI's powerful language models for text generation and reasoning.
- Category:
text/generation - Inputs:
prompt: The text prompt.model: The OpenAI model to use (e.g.,gpt-4.1,o1,gpt-5).temperature: Controls randomness.reasoning_effort: Effort level for reasoning models.api_key: Your OpenAI API key.max_output_tokens: Maximum number of tokens to generate.system_instruction(Optional): System level instructions.image(Optional): Input image for multimodal models.
- Output:
response: The generated text response.
OpenAI Text-to-Speech
Generate high-quality speech from text using OpenAI's TTS models.
- Category:
audio/generation - Inputs:
text: The text to convert to speech.model: The TTS model to use (e.g.,gpt-4o-mini-tts,tts-1).voice: The voice to use (e.g.,alloy,echo).response_format: Output audio format.speed: Speaking speed.api_key: Your OpenAI API key.instructions(Optional): Instructions for the model (supported by some models).
- Output:
audio: The generated audio.
Google Imagen Generator
Generate images from a text prompt using Google's Imagen models.
- Category:
image/generation - Inputs:
prompt: A text description of the image to generate.api_key: Your API key from Google AI Studio.model: The Imagen model to use....other_params: Options for number of images, aspect ratio, and image size.
- Output:
images: The generated image(s).
Google Imagen Edit (Vertex AI only)
Perform advanced image editing, inpainting, outpainting, and background swapping using Imagen on Google's Vertex AI platform.
- Category:
image/edit - Inputs:
image: The source image to edit.mask: A mask defining the area to edit.prompt: A description of the desired edit.project_id: Your Google Cloud Project ID.location: The Google Cloud location for the model.service_account: Path to your Google Cloud service account JSON file.edit_mode: The type of edit to perform (e.g., inpainting, outpainting)....other_params: Controls for negative prompt, seed, and steps.
- Output:
edited_images: The edited image(s).
Nano Banana
A creative image generation node that can take a combination of text and up to five images as input.
- Category:
image/generation - Inputs:
api_key: Your API key from Google AI Studio.prompt(Optional): A text prompt.image_1toimage_5(Optional): Up to five source images....other_params: Controls for aspect ratio, temperature, top_p, and seed.
- Output:
image: The generated image.
Veo Video Generator (Vertex AI)
Generate short, high-quality video clips from a text description using Google's Veo model on Vertex AI.
- Category:
video/generation - Inputs:
prompt: A text description of the video to generate.project_id: Your Google Cloud Project ID.location: The Google Cloud location for the model.service_account: Path to your Google Cloud service account JSON file....other_params: Controls for negative prompt, aspect ratio, audio generation, and seed.
- Output:
frames: The generated video frames, output as an image batch.
Veo Video Generator (Gemini API)
Generate videos using Google's Veo 2.0 model via the Gemini API. Supports text-to-video and image-to-video.
- Category:
video/generation - Inputs:
prompt: A text description of the video.image(Optional): An input image for image-to-video generation.api_key: Your API key from Google AI Studio.model: The Veo model to use (e.g.,veo-2.0-generate-001).aspect_ratio: Desired aspect ratio (16:9 or 9:16).duration_seconds: Duration of the video (e.g., 5-8 seconds)....other_params: Controls for negative prompt and seed.
- Output:
frames: The generated video frames.
ElevenLabs TTS
Generate speech from text using the ElevenLabs API.
- Category:
audio/generation - Inputs:
text: The text to convert to speech.api_key: Your API key from ElevenLabs.voice_id: The ID of the voice to use for generation.model_id: The ElevenLabs model to use.output_format: The desired output audio format.stability: Controls the stability and variability of the generated speech.similarity_boost: Enhances the similarity of the generated speech to the chosen voice.speed: Adjusts the speaking rate.style: Controls the expressiveness of the speech.use_speaker_boost: A boolean to enable or disable speaker boost.seed: A seed for ensuring reproducible results.
- Output:
audio: The generated audio waveform and sample rate.
Gemini TTS
Generate speech from text using Google's Gemini TTS models.
- Category:
audio/generation - Inputs:
text: The text to be converted into speech.api_key: Your API key from Google AI Studio.model: The specific Gemini model to use for generation.voice_id: The prebuilt voice to use for the output.temperature: Controls the randomness and creativity of the output.seed: A seed for ensuring reproducible results.system_prompt(Optional): A system-level instruction to guide the model's behavior.
- Output:
audio: The generated audio waveform and sample rate.
Acknowledgements
- The ComfyUI team for creating such a flexible and powerful platform.
- Google, OpenAI, and Black Forest Labs for developing these incredible models.
- Replicate for providing easy API access to a wide range of models.