ComfyUI Extension: ComfyUI OpenAI Compatible LLM Node

Authored by lxe

Created 2 months ago

Updated 2 months ago

0 stars

A ComfyUI custom node that provides integration with OpenAI-compatible Large Language Model APIs, including OpenAI, local models, and other compatible endpoints. Supports both text-only and multimodal (text + image) interactions.

Custom Nodes (0)

README

ComfyUI OpenAI Compatible LLM Node

Features

Multi-line prompt input: Large text area for complex prompts
Image input support: Optional image input for multimodal LLMs (GPT-4 Vision, etc.)
Configurable endpoint: Support for OpenAI API and other compatible services
Secure token input: API key/token field for authentication
Model selection: Specify which model to use (defaults to vision-capable model)
Generation parameters: Control max tokens, temperature, and image detail level
Automatic image encoding: Converts ComfyUI images to base64 for API compatibility
Error handling: Comprehensive error reporting and fallback responses

Installation

Method 1: ComfyUI Manager (Recommended)

Install ComfyUI Manager if you haven't already
Open ComfyUI Manager in your ComfyUI interface
Search for "OpenAI Compatible LLM Node"
Click Install

Method 2: Manual Installation

Navigate to your ComfyUI installation directory:
```
cd /path/to/your/ComfyUI
```

Clone this repository into the custom_nodes directory:

cd custom_nodes
git clone https://github.com/yourusername/ComfyUI-OpenAI-Compat-LLM-Node.git

Install the required dependencies:

cd ComfyUI-OpenAI-Compat-LLM-Node
pip install -r requirements.txt

Restart ComfyUI

Method 3: Direct Download

Download the latest release from the releases page
Extract the archive to your ComfyUI/custom_nodes/ directory
Install dependencies:
```
pip install -r requirements.txt
```
Restart ComfyUI

Usage

After installation, restart ComfyUI
In the ComfyUI interface, right-click to add a new node
Navigate to LLM → OpenAI Compatible LLM
Configure the node with your settings:
- Prompt: Enter your text prompt (supports multi-line input)
- Endpoint: API endpoint URL (default: OpenAI's endpoint)
- API Token: Your API key/token
- Image (optional): Connect an image from another node for multimodal analysis
- Model: Model name (default: gpt-4-vision-preview for multimodal support)
- Max Tokens: Maximum response length (default: 150)
- Temperature: Creativity/randomness (0.0-2.0, default: 0.7)
- Image Detail: Quality level for image processing (low/high/auto, default: auto)

Supported Endpoints

This node works with any OpenAI-compatible API endpoint, including:

OpenAI API: https://api.openai.com/v1/chat/completions
Local models (via tools like ollama, text-generation-webui, etc.)
Cloud providers with OpenAI-compatible APIs
Self-hosted solutions

Configuration Examples

OpenAI API (Text + Vision)

Endpoint: https://api.openai.com/v1/chat/completions
Models:
- Text-only: gpt-3.5-turbo, gpt-4, gpt-4-turbo
- Vision: gpt-4-vision-preview, gpt-4-turbo (with vision)
API Token: Your OpenAI API key

Local Ollama (Vision Models)

Endpoint: http://localhost:11434/v1/chat/completions
Models:
- Text-only: llama2, mistral, codellama
- Vision: llava, bakllava, llava-llama3
API Token: Leave empty for local usage

Text Generation WebUI (with MultiModal)

Endpoint: http://localhost:5000/v1/chat/completions
Model: Your loaded vision-capable model
API Token: Set if authentication is enabled

Usage Examples

Text-Only Generation

Add the node to your workflow
Set your prompt: "Explain the concept of machine learning"
Leave the image input disconnected
Use a text model like gpt-3.5-turbo

Image Analysis

Connect an image output from another node to the image input
Set your prompt: "Describe what you see in this image"
Use a vision model like gpt-4-vision-preview
Adjust image detail level as needed

Image + Text Prompt

Connect an image and set a specific prompt
Example: "What colors are prominent in this image and how do they affect the mood?"
The model will analyze both the text and image together

Requirements

ComfyUI
Python 3.8+
requests >= 2.32.3
Pillow >= 10.0.0
numpy >= 1.24.0

Node Inputs

| Input | Type | Required | Default | Description | |-------|------|----------|---------|-------------| | prompt | STRING | Yes | "You are a helpful assistant." | The text prompt to send to the LLM | | endpoint | STRING | Yes | "https://api.openai.com/v1/chat/completions" | API endpoint URL | | api_token | STRING | Yes | "" | API authentication token | | image | IMAGE | No | None | Optional image input for multimodal analysis | | model | STRING | No | "gpt-4-vision-preview" | Model name to use (vision-capable by default) | | max_tokens | INT | No | 150 | Maximum tokens in response | | temperature | FLOAT | No | 0.7 | Sampling temperature (0.0-2.0) | | image_detail | STRING | No | "auto" | Image processing detail level (low/high/auto) |

Node Outputs

| Output | Type | Description | |--------|------|-------------| | response | STRING | The generated text response from the LLM |

Error Handling

The node includes comprehensive error handling:

Network connection errors
API authentication failures
Invalid JSON responses
Rate limiting and timeout issues
Missing response content

Errors are returned as descriptive text strings for debugging.

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For issues and feature requests: GitHub Issues
For discussions: GitHub Discussions

Changelog

v1.1.0

Added image input support for multimodal LLMs
Automatic base64 image encoding
Support for GPT-4 Vision and other vision models
Image detail level control
Updated dependencies (Pillow, numpy)

v1.0.0

Initial release
Basic OpenAI-compatible API integration
Multi-line prompt support
Configurable endpoints and models
Comprehensive error handling