ComfyUI Extension: ComfyUI-NvidiaCaptioner
A ComfyUI node for generating rich, detailed captions for images using NVIDIA's vision models. Supports batch processing, multiple captioning styles, and includes built-in caching for efficient workflows.
Custom Nodes (0)
README
NVIDIA Captioner for ComfyUI
A powerful ComfyUI node for generating rich, detailed captions for images using NVIDIA's vision models. This node allows batch processing of images with customizable prompts and supports various captioning styles.
Features
- 🖼️ Batch process multiple images with a single click
- 🎨 Multiple captioning styles (detailed, concise, product-focused, etc.)
- ⚡ Optimized for performance with rate limiting
- 🔄 Built-in caching to avoid reprocessing the same images
- 📊 Progress tracking for batch operations
- 🔍 Case-insensitive image filtering
- 🎭 Support for custom system prompts
Installation
-
Navigate to your ComfyUI
custom_nodesdirectory:cd ComfyUI/custom_nodes -
Clone this repository:
git clone https://github.com/theshubzworld/ComfyUI-NvidiaCaptioner.git -
Install the required dependencies:
pip install -r ComfyUI-NvidiaCaptioner/requirements.txt -
Restart ComfyUI.
Usage
-
Add the NVIDIA Captioner node to your workflow from the
NVIDIA/Visioncategory. -
Configure the node settings:
- Image Directory: Folder containing images to process
- API Key: Your NVIDIA API key
- Model: Select the vision model to use
- Prompt Style: Choose from various captioning styles
- Use Cache: Toggle to skip already processed images
-
Connect the output to any text node or save the captions to a file.
Node Configuration
Inputs
image_directory: Path to directory containing images to processapi_key: Your NVIDIA API keymodel: The vision model to use (default: "nvidia/vision")system_prompt_preset: Predefined prompt stylescustom_system_prompt: Custom prompt (overrides preset if provided)prompt: Instruction for the modeluse_cache: Skip already processed images (default: True)skip_existing_txt: Skip images with existing .txt files (default: False)max_tokens: Maximum tokens in response (default: 300)temperature: Sampling temperature (default: 0.2)top_p: Nucleus sampling parameter (default: 0.7)frequency_penalty: Penalize frequent tokens (default: 0.0)presence_penalty: Penalize new tokens (default: 0.0)max_retries: Maximum retry attempts (default: 3)retry_delay: Delay between retries in seconds (default: 2.0)
Outputs
all_captions: Concatenated captions for all processed imageslast_caption: Caption for the most recently processed image
Example Workflow
- Load a batch of images using the Load Batch node
- Connect to the NVIDIA Captioner node
- Configure the captioning settings
- Save or use the generated captions in your workflow
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support
For issues and feature requests, please open an issue.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.