ComfyUI Extension: ComfyUI-TogetherVision

Authored by theshubzworld

Created 8 months ago

Updated 6 months ago

2 stars

A custom ComfyUI node using Together AI's Vision models (paid/free) to generate detailed image descriptions. Features advanced parameters, flexible API key management, and customizable prompts.

Custom Nodes (2)

README

ComfyUI-TogetherVision

A custom node for ComfyUI that enables image description using Together AI's Vision models. This node allows you to generate detailed descriptions of images using either the paid or free version of Together AI's Llama Vision models.

Together Vision Node

Features

🖼️ Image Description & Text Generation:

Generate detailed descriptions of images using state-of-the-art vision models
Toggle vision processing on/off for flexible usage
Use as a text-only LLM when vision is disabled
New Free Image Generation Node: Utilize free vision models for image analysis

🤖 Multiple Models:

Paid Version: Llama-3.2-11B-Vision-Instruct-Turbo
Free Version: Llama-Vision-Free

⚙️ Customizable Parameters:

Temperature control
Top P sampling
Top K sampling
Repetition penalty

🔑 Flexible API Key Management:

Direct input in the node
Environment variable through .env file

📝 Custom Prompting:

System prompt customization
User prompt customization

Getting Started

1. Get Together AI API Key

Go to Together AI API Settings
Sign up or log in to your Together AI account
Click "Create API Key"
Copy your API key for later use

2. Installation

Clone this repository into your ComfyUI custom_nodes directory:

cd ComfyUI/custom_nodes
git clone https://github.com/thetheshubzworld/ComfyUI-TogetherVision.git

Restart ComfyUI - it will automatically install the required dependencies from requirements.txt
Set up your Together AI API key using one of these methods:
- Option 1: Create a .env file in the node directory:
```
TOGETHER_API_KEY=your_api_key_here
```
- Option 2: Input your API key directly in the node

Usage

Add the "Together Vision 🔍" node to your workflow
Configure Vision Mode:
- Enable Vision (Default): Connect an image output to the node's image input
- Disable Vision: Skip image input for text-only generation
Select your preferred model (Paid or Free)
Configure the parameters:
- Temperature (0.0 - 2.0)
- Top P (0.0 - 1.0)
- Top K (1 - 100)
- Repetition Penalty (0.0 - 2.0)
Customize the prompts:
- System prompt: Sets the behavior of the AI
- User prompt: Specific instructions for image description or text generation

Parameters

| Parameter | Description | Default | Range | |-----------|-------------|---------|--------| | Vision Enable | Toggles vision processing | True | True/False | | Temperature | Controls randomness | 0.7 | 0.0 - 2.0 | | Top P | Nucleus sampling | 0.7 | 0.0 - 1.0 | | Top K | Top K sampling | 50 | 1 - 100 | | Repetition Penalty | Prevents repetition | 1.0 | 0.0 - 2.0 |

Image Resolution Limits

The node automatically handles high-resolution images:

Images larger than 2048x2048 pixels will be automatically resized
Aspect ratio is preserved during resizing
High-quality LANCZOS resampling is used

For best results:

Keep image dimensions under 2048 pixels
Use ComfyUI's built-in resize nodes before this node
For very large images, consider splitting them into sections

Rate Limits

Free Model (Llama-Vision-Free)

Limited to approximately 100 requests per day
Rate limit resets every 24 hours
Hourly limits may apply (typically 20-30 requests per hour)

Paid Model (Llama-3.2-11B-Vision)

Higher rate limits based on your Together AI subscription
Better performance and reliability
Priority API access

Handling Rate Limits

When you hit a rate limit:

Wait for the specified time (usually 1 hour for hourly limits)
Switch to a different Together AI account
Upgrade to the paid model for higher limits
Consider batching your requests during off-peak hours

Tips to Avoid Rate Limits

Cache results for repeated images
Use the paid model for production workloads
Monitor your API usage through Together AI dashboard
Space out your requests when possible

Operating Modes

Vision Mode (Default)

Requires connected image input
Generates detailed image descriptions
Full vision + language capabilities

Text-Only Mode

No image input required
Functions as a standard LLM
Useful for text generation and chat

Error Handling

The node includes comprehensive error handling and logging:

API key validation
Rate limit notifications
Image processing errors
API response errors
Vision mode validation

Examples

Here are some example prompts you can try:

Vision Mode - Detailed Description:

Describe this image in detail, including colors, objects, and composition.

Vision Mode - Technical Analysis:

Analyze this image from a technical perspective, including lighting, composition, and photographic techniques.

Text-Only Mode - Creative Writing:

Write a creative story about a magical forest.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Together AI for providing the Vision API
ComfyUI community for the framework and support

Support

If you encounter any issues or have questions:

Check the error logs in ComfyUI
Ensure your API key is valid
Check Together AI's service status
Open an issue on GitHub

Note: This node requires a Together AI account and API key. You can get one at Together AI's website.

Updated README.md to reflect automatic mode switching based on image connection

The node now automatically switches between Vision Mode and Text-Only Mode based on the presence of an image input connection. When an image is connected, the node will generate detailed image descriptions. When no image is connected, the node will function as a text generation model.

Flexible Processing Modes

Image + Text Mode: When an image is connected, generates descriptions and responses about the image
Text-Only Mode: When no image is connected, functions as a text generation model
Seamlessly switches between modes based on input connections