ComfyUI Extension: ComfyUI-GeminiImageToPrompt

Authored by santiagosamuel3455

Created

Updated

1 stars

Imagen descripcion prompt system

Custom Nodes (0)

    README

    ComfyUI-GeminiImageToPrompt

    Advanced Prompt Generation System for Audiovisual Content 3 Integrated Nodes to Optimize Multimodal Content Creation

    1. Gemini Text to Cinematic Prompt Node (Google API) Function: Transforms basic textual descriptions into high-quality cinematic prompts using Google's Gemini model. Capabilities:

    Enriches narrative, stylistic, and technical details (e.g., lighting, camera angles, atmosphere). Ideal for generating videos or images with a cinematic focus. Application: Use it as a starting point for projects that require professional prompts without writing from scratch. 2. Gemini Image to Prompt Node (Multimodal Analysis) Function: Analyzes input images to automatically generate descriptive prompts, optimized for video conversion. Capabilities:

    Extracts key visual elements (colors, objects, composition, artistic style). Translates the analysis into technical instructions for image-to-video flows. Application: Avoid manual prompt writing by working with reference images or existing assets. 3. Deepseek R1 Node + KlingAI Text/Image to Video (Free Version) Feature: Generate refined prompts for text-to-video or image-to-video content, leveraging KlingAI technology with a free account. Advantages:

    No credit consumption: Ideal for users on a budget. Support for multimodal input (text and image) for hybrid workflows. Application: Create detailed scripts for videos using short descriptions or images, maintaining quality and visual consistency. Recommended Workflow: Text to Cinematic Prompt: Use Gemini to define a rich narrative. Image to Technical Prompt: Analyze visual references with Gemini. Final Video Generation: Export optimized prompts to KlingAI/Deepseek R1 for video production. Key Benefit: Combines Gemini's strengths (creativity and analytics) with KlingAI's accessibility, reducing manual effort and operational costs.Advanced Prompt Generation System for Audiovisual Content 3 Integrated Nodes to Optimize Multimodal Content Creation

    1. Gemini Text to Cinematic Prompt Node (Google API) Function: Transforms basic textual descriptions into high-quality cinematic prompts using Google's Gemini model. Capabilities:

    Enriches narrative, stylistic, and technical details (e.g., lighting, camera angles, atmosphere). Ideal for generating videos or images with a cinematic focus. Application: Use it as a starting point for projects that require professional prompts without writing from scratch. 2. Gemini Image to Prompt Node (Multimodal Analysis) Function: Analyzes input images to automatically generate descriptive prompts, optimized for video conversion. Capabilities:

    Extracts key visual elements (colors, objects, composition, artistic style). Translates the analysis into technical instructions for image-to-video flows. Application: Avoid manual prompt writing by working with reference images or existing assets. 3. Deepseek R1 Node + KlingAI Text/Image to Video (Free Version) Feature: Generate refined prompts for text-to-video or image-to-video content, leveraging KlingAI technology with a free account. Advantages:

    No credit consumption: Ideal for users on a budget. Support for multimodal input (text and image) for hybrid workflows. Application: Create detailed scripts for videos using short descriptions or images, maintaining quality and visual consistency. Recommended Workflow: Text to Cinematic Prompt: Use Gemini to define a rich narrative. Image to Technical Prompt: Analyze visual references with Gemini. Final Video Generation: Export optimized prompts to KlingAI/Deepseek R1 for video production. Key Benefit: Combines Gemini's strengths (creativity and analytics) with KlingAI's accessibility, reducing manual effort and operational costs.