ComfyUI Extension: LM Studio Image to Text Node for ComfyUI

Authored by mattjohnpowell

Created about a year ago

Updated 13 days ago

19 stars

A custom node for ComfyUI that integrates LM Studio's vision models to generate text descriptions of images. It provides a flexible and customizable way to add image-to-text capabilities to your ComfyUI workflows, working with LM Studio's local API.

Custom Nodes (2)

README

LM Studio Nodes for ComfyUI

Author: Matt John Powell

This extension provides a suite of custom nodes for ComfyUI that deeply integrate LM Studio's capabilities using the official lmstudio Python SDK. It allows you to leverage locally run models for various generative tasks directly within your ComfyUI workflows.

The nodes offer functionalities including:

Unified Generation: Generate text from text prompts, image prompts, or both combined.
Image to Text: Generate detailed text descriptions of images using vision models.
Text Generation: Generate text based on a given prompt using language models, with streaming support.
Model Management: List available models, load models into memory with a TTL, and unload them.
Model Selection: Dynamically select models based on type (LLM/Vision) and text filters.
Setup Assistance: Helper node for SDK installation check, model listing, and download guidance.

Workflow Example

Here's an example of how the LM Studio nodes can be used in a ComfyUI workflow:

LM Studio Nodes Workflow (Ensure this image accurately reflects a workflow using the new nodes)

Features

Utilizes the official lmstudio Python SDK for robust connection and interaction.
Supports text-only, image-only (vision), and combined text+image inputs.
Identifies models using model_key strings (e.g., llama-3.2-1b-instruct).
Provides dedicated nodes for specific tasks (Image-to-Text, Text Generation).
Includes nodes for managing model memory (Load/Unload/List).
Offers a dynamic model selector node for easier workflow building.
Setup node provides guidance for installing the SDK and getting models.
Customizable system prompts for context setting.
Control over generation parameters like max_tokens, temperature, and seed.
Optional streaming output for the Text Generation node.
Debug mode for detailed console logging.
Automatic connection to the LM Studio server (no need to specify IP/Port in nodes).

Installation

Navigate to your ComfyUI custom_nodes directory.

Clone this repository:

cd /path/to/ComfyUI/custom_nodes
git clone [https://github.com/mattjohnpowell/comfyui-lmstudio-nodes.git](https://github.com/mattjohnpowell/comfyui-lmstudio-nodes.git) ComfyExpo-LMStudioNodes
# Using 'ComfyExpo-LMStudioNodes' as the directory name to avoid potential conflicts

Install Dependencies: Ensure the lmstudio package is installed in ComfyUI's Python environment:

# Activate your ComfyUI Python environment (if using venv, conda, etc.) first!
pip install lmstudio

Troubleshooting

LM Studio Version Compatibility

If you encounter errors like:

Error processing with LM Studio: Object missing required field 'bosToken' - at '$.promptTemplate.jinjaPromptTemplate'
Other SDK-related errors after upgrading LM Studio

This usually means your LM Studio SDK version is incompatible with your LM Studio version. Solution:

Quick Fix: Run the upgrade helper script:
```
python upgrade_lmstudio.py
```
Manual Fix: Upgrade the LM Studio SDK:
```
pip install lmstudio --upgrade
```
Restart ComfyUI completely after upgrading.

Backward Compatibility

The nodes now support backward compatibility with older workflow files that used:

model parameter instead of model_key
ip_address and port parameters for HTTP connections

When old parameters are detected, the nodes will:

Show a deprecation warning
Automatically use the legacy HTTP mode if ip_address and port are provided
Map model parameter to model_key for SDK mode

Recommendation: Update old workflows to use the new SDK-based parameters for better performance and reliability. (You might need to use a specific pip command depending on your ComfyUI setup, e.g., for portable builds) 4. Restart ComfyUI.

Usage

Add the nodes to your workflow by right-clicking the canvas and searching for their names (e.g., "LM Studio Unified (Expo)").

LM Studio Unified (Expo)

Handles text-only, image-only, or combined text+image generation.

Inputs:

model_key (STRING, required): The key of the LM Studio model to use (e.g., llama-3.2-1b-instruct or a vision model key). Default: llama-3.2-1b-instruct.
system_prompt (STRING, required): System prompt for the AI. Default: "You are a helpful AI assistant."
seed (INT, required): Seed for reproducibility (-1 for random). Default: -1.
image (IMAGE, optional): Input image (requires a vision-capable model_key).
text_input (STRING, optional): Text prompt. Default: "".
max_tokens (INT, optional): Max tokens for the response. Default: 1000.
temperature (FLOAT, optional): Generation temperature. Default: 0.7.
debug (BOOLEAN, optional): Enable console logging. Default: False.

Output:

Generated Text (STRING): The model's response.

(Note: At least one of image or text_input must be provided for the node to run).

LM Studio I2T (Expo)

Generates text descriptions for images using vision models.

Inputs:

model_key (STRING, required): The key of the vision-capable LM Studio model. Default: qwen2-vl-2b-instruct.
system_prompt (STRING, required): System prompt for the AI. Default: "This is a chat between a user and an assistant. The assistant is an expert in describing images, with detail and accuracy".
seed (INT, required): Seed for reproducibility (-1 for random). Default: -1.
image (IMAGE, required): The input image to be described.
user_prompt (STRING, optional): The prompt asking about the image. Default: "Describe this image in detail".
max_tokens (INT, optional): Max tokens for the response. Default: 1000.
temperature (FLOAT, optional): Generation temperature. Default: 0.7.
debug (BOOLEAN, optional): Enable console logging. Default: False.

Output:

Description (STRING): The generated text description.

LM Studio Text Gen (Expo)

Generates text based on a text prompt using language models.

Inputs:

model_key (STRING, required): The key of the LM Studio language model. Default: llama-3.2-1b-instruct.
system_prompt (STRING, required): System prompt for the AI. Default: "You are a helpful AI assistant.".
seed (INT, required): Seed for reproducibility (-1 for random). Default: -1.
prompt (STRING, optional): The input prompt for text generation. Default: "Generate a creative story:".
max_tokens (INT, optional): Max tokens for the response. Default: 1000.
temperature (FLOAT, optional): Generation temperature. Default: 0.7.
stream_output (BOOLEAN, optional): Stream response fragments (Note: final output in ComfyUI is still the complete text). Default: False.
debug (BOOLEAN, optional): Enable console logging. Default: False.

Output:

Generated Text (STRING): The generated text.

(Note: Requires a non-empty prompt to run).

LM Studio Model Mgr (Expo)

Manages models loaded via the LM Studio SDK.

Inputs:

action (COMBO, required): Action to perform (LIST, LOAD, UNLOAD). Default: LIST.
model_key (STRING, optional): The model key to load or unload. Required for LOAD/UNLOAD. Default: "".
model_type (COMBO, optional): Filter models by type (ALL, LLM, EMBEDDING). Default: ALL.
load_ttl (INT, optional): Time-to-live (seconds) for loaded models (how long to keep in memory after last use). Used with LOAD. Default: 3600.
debug (BOOLEAN, optional): Enable console logging. Default: False.

Output:

Result (STRING): Confirmation message or list of models.

LM Studio Model Sel (Expo)

Dynamically provides a model key based on filters, useful for connecting to other nodes.

Inputs:

model_type (COMBO, required): Type of model to list (LLM, Vision). Default: LLM.
filter_text (STRING, optional): Text to filter model names/keys by. Default: "".

Output:

Model Key (STRING): The model_key of the first matching model (sorted alphabetically), or a default if none match.

(Note: This node attempts to list models available via the SDK at workflow load time. Ensure LM Studio server is running when building workflows).

LM Studio Setup (Expo)

Provides helper actions related to SDK setup and model discovery.

Inputs:

action (COMBO, required): Action to perform (INSTALL SDK, GET MODEL, LIST MODELS). Default: LIST MODELS.
model_key (STRING, required): Model key relevant to the action (used for GET MODEL). Default: llama-3.2-1b-instruct.

Output:

Result (STRING): Status message, command guidance, or list of models.

(Note: INSTALL SDK attempts pip install lmstudio. GET MODEL provides instructions for using the lms CLI tool).

LM Studio Setup

Install and run LM Studio on your machine.
Download desired models within LM Studio (ensure you have vision models for image tasks).
Go to the "Server" tab (icon looks like <->) in LM Studio.
Select a model to load and click Start Server.
The LM Studio server must be running for these ComfyUI nodes to connect and function.

Notes

These nodes use the official lmstudio Python SDK, which handles the connection to your running LM Studio server automatically (typically localhost:1234). There's no need to input IP/Port in the nodes themselves.
Use the model_key (found in LM Studio, e.g., Org/ModelName-Format) as the identifier in the model_key inputs.
The seed input allows for reproducible outputs. Set to -1 for a random seed on each run.

SDK Usage

These nodes leverage the official lmstudio Python SDK, replacing the previous method of interacting via the OpenAI-compatible API endpoint directly. This provides more robust integration and access to SDK-specific features.

Troubleshooting

If you encounter any issues:

Enable the debug input (set to True) on the relevant node(s).
Check the ComfyUI console for error messages and detailed debug output from the nodes.
Verify that the LM Studio application is running and the Server has been started with a model loaded.
Ensure the model_key you are providing exists in your LM Studio library and is compatible with the node's task (e.g., a vision model for the I2T node).
Confirm the lmstudio Python package is correctly installed in your ComfyUI environment.

For further assistance, please open an issue on the GitHub repository: https://github.com/mattjohnpowell/comfyui-lmstudio-nodes

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built upon the ComfyUI framework.
Utilizes the official LM Studio Python SDK.
Inspired by LM Studio's capabilities and examples.