ComfyUI Extension: ComfyUI-ExternalAPI-Helpers

Authored by Aryan185

Created

Updated

3 stars

ComfyUI node for Flux Kontext Pro and Max models from Replicate

Custom Nodes (0)

    README

    ComfyUI-ExternalAPI-Helpers

    A collection of powerful custom nodes for ComfyUI that connect your local workflows to closed-source AI models via their APIs. Use Google's Gemini, Imagen, Veo, OpenAI's GPT-Image-1, and Black Forest Labs' FLUX models directly within ComfyUI.

    Key Features

    • FLUX Kontext Pro & Max: Image-to-image transformations using the FLUX models via the Replicate API.
    • Gemini Chat: Google's powerful multimodal AI. Ask questions about an image, generate detailed descriptions or create prompts for other models. Supports thinking budget controls for applicable models.
    • Gemini Segmentation: Generate segmentation masks for objects in an image using Gemini.
    • GPT Image Edit: OpenAI's gpt-image-1 for prompt-based image editing and inpainting. Simply mask an area and describe the change you want to see.
    • Google Imagen Generator & Edit: Create and edit images with Google's Imagen models, with support for Vertex AI.
    • Nano Banana: A creative image generation node using a specialized Gemini model.
    • Veo Text-to-Video: Generate high-quality video clips from text prompts using Google's Veo model via Vertex AI.
    • ElevenLabs TTS: Generate high-quality speech from text using ElevenLabs' diverse range of voices and models.
    • Gemini TTS: Create speech from text using Google's Gemini models.
    • Seamless Integration: All nodes are designed to work seamlessly with standard ComfyUI inputs (IMAGE, MASK, STRING) and outputs, allowing you to chain them into complex and creative workflows.
    • Secure & Simple: Simply provide your API key in the node's input field to get started.

    🚀 Installation

    1. Navigate to your ComfyUI installation directory.

    2. Go into the custom_nodes folder:

      cd ComfyUI/custom_nodes/
      
    3. Clone this repository:

      git clone https://github.com/Aryan185/ComfyUI-ExternalAPI-Helpers.git
      
    4. Install the required Python packages. Navigate into the newly cloned directory and use pip to install the dependencies:

      cd ComfyUI-ExternalAPI-Helpers
      pip install -r requirements.txt
      
    5. Restart ComfyUI. After restarting, you should find the new nodes in the "Add Node" menu.


    🔑 Prerequisites: API Keys

    All nodes in this collection require API keys to function.

    • FLUX Nodes (Replicate): You will need a Replicate API Token.
    • Gemini, Imagen, Nano Banana, and Gemini TTS Nodes: You will need a Google AI Studio API Key.
    • GPT Image Edit Node: You will need an OpenAI API Key.
    • ElevenLabs TTS Node: You will need an ElevenLabs API Key.
    • Vertex AI Nodes (Imagen Edit, Veo): You will need a Google Cloud Project ID, a service account with appropriate permissions, and the location for the resources.

    You can paste your key directly into the api_key field on the corresponding node. For Vertex AI nodes, you will need to provide the project ID, location, and path to your service account JSON file.


    📚 Node Guide

    Flux Kontext Pro / Max

    These nodes allow you to transform an input image based on a text prompt. They are ideal for applying artistic styles or making significant conceptual changes to an existing image.

    • Category: image/edit
    • Inputs:
      • image: The source image to transform.
      • prompt: A text description of the desired output (e.g., "A vibrant Van Gogh painting", "Make this a 90s cartoon").
      • replicate_api_token: Your API token from Replicate.
      • aspect_ratio: The desired output aspect ratio. match_input_image is highly recommended to preserve the original composition.
      • output_format: jpg or png.
      • safety_tolerance: Adjust the content safety filter level.
    • Output:
      • image: The generated image.

    Gemini Chat

    A versatile node for text generation and image analysis. Use it to understand an image's content or to generate creative text for other nodes.

    • Category: text/generation
    • Inputs:
      • prompt: The text prompt or question you want to ask the model.
      • image (Optional): An input image for the model to analyze.
      • api_key: Your API key from Google AI Studio.
      • model: The Gemini model to use (e.g., gemini-2.5-pro).
      • system_instruction (Optional): Provide context or rules for how the model should behave.
      • temperature: Controls the creativity of the output. Higher is more creative.
      • thinking: Enables the model's thinking/reasoning process (Gemini 2.5 Pro).
    • Output:
      • response: The text generated by the Gemini model.

    Gemini Segmentation

    This node uses a Gemini model to generate segmentation masks for specified objects within an image.

    • Category: image/generation
    • Inputs:
      • image: The source image for segmentation.
      • segment_prompt: A text description of the objects to segment (e.g., "the car", "all people").
      • api_key: Your API key from Google AI Studio.
      • model: The Gemini model to use.
      • ...other_params: Controls for temperature, thinking, and seed.
    • Output:
      • mask: A black and white mask of the segmented objects.

    GPT Image Edit

    This node uses OpenAI's API to perform powerful, prompt-based inpainting and editing.

    • Category: image/edit
    • Inputs:
      • image: The source image to edit.
      • mask (Optional): A black and white mask. The model will edit the white area of the mask.
      • prompt: A description of the edit to perform (e.g., "Add a small red boat on the water", "Remove the person on the left").
      • api_key: Your API key from OpenAI.
      • ...other_params: Various quality and formatting options for the OpenAI API.
    • Output:
      • image: The edited image.

    Note: If a mask is provided, the edits will be constrained to the masked region. If no mask is provided, the model will attempt to edit the entire image based on the prompt.

    Google Imagen Generator

    Generate images from a text prompt using Google's Imagen models.

    • Category: image/generation
    • Inputs:
      • prompt: A text description of the image to generate.
      • api_key: Your API key from Google AI Studio.
      • model: The Imagen model to use.
      • ...other_params: Options for number of images, aspect ratio, and image size.
    • Output:
      • images: The generated image(s).

    Google Imagen Edit (Vertex AI only)

    Perform advanced image editing, inpainting, outpainting, and background swapping using Imagen on Google's Vertex AI platform.

    • Category: image/edit
    • Inputs:
      • image: The source image to edit.
      • mask: A mask defining the area to edit.
      • prompt: A description of the desired edit.
      • project_id: Your Google Cloud Project ID.
      • location: The Google Cloud location for the model.
      • service_account: Path to your Google Cloud service account JSON file.
      • edit_mode: The type of edit to perform (e.g., inpainting, outpainting).
      • ...other_params: Controls for negative prompt, seed, and steps.
    • Output:
      • edited_images: The edited image(s).

    Nano Banana

    A creative image generation node that can take a combination of text and up to five images as input.

    • Category: image/generation
    • Inputs:
      • api_key: Your API key from Google AI Studio.
      • prompt (Optional): A text prompt.
      • image_1 to image_5 (Optional): Up to five source images.
      • ...other_params: Controls for aspect ratio, temperature, top_p, and seed.
    • Output:
      • image: The generated image.

    Veo Text-to-Video (Vertex AI)

    Generate short, high-quality video clips from a text description using Google's Veo model on Vertex AI.

    • Category: video/generation
    • Inputs:
      • prompt: A text description of the video to generate.
      • project_id: Your Google Cloud Project ID.
      • location: The Google Cloud location for the model.
      • service_account: Path to your Google Cloud service account JSON file.
      • ...other_params: Controls for negative prompt, aspect ratio, audio generation, and seed.
    • Output:
      • frames: The generated video frames, output as an image batch.

    ElevenLabs TTS

    Generate speech from text using the ElevenLabs API.

    • Category: audio/generation
    • Inputs:
      • text: The text to convert to speech.
      • api_key: Your API key from ElevenLabs.
      • voice_id: The ID of the voice to use for generation.
      • model_id: The ElevenLabs model to use.
      • output_format: The desired output audio format.
      • stability: Controls the stability and variability of the generated speech.
      • similarity_boost: Enhances the similarity of the generated speech to the chosen voice.
      • speed: Adjusts the speaking rate.
      • style: Controls the expressiveness of the speech.
      • use_speaker_boost: A boolean to enable or disable speaker boost.
      • seed: A seed for ensuring reproducible results.
    • Output:
      • audio: The generated audio waveform and sample rate.

    Gemini TTS

    Generate speech from text using Google's Gemini TTS models.

    • Category: audio/generation
    • Inputs:
      • text: The text to be converted into speech.
      • api_key: Your API key from Google AI Studio.
      • model: The specific Gemini model to use for generation.
      • voice_id: The prebuilt voice to use for the output.
      • temperature: Controls the randomness and creativity of the output.
      • seed: A seed for ensuring reproducible results.
      • system_prompt (Optional): A system-level instruction to guide the model's behavior.
    • Output:
      • audio: The generated audio waveform and sample rate.

    Acknowledgements