A custom node for ComfyUI that integrates LM Studio's vision models to generate text descriptions of images. It provides a flexible and customizable way to add image-to-text capabilities to your ComfyUI workflows, working with LM Studio's local API.
This extension provides custom nodes for ComfyUI that integrate LM Studio's capabilities. It offers two main functionalities:
Both nodes are designed to work with LM Studio's local API, providing flexible and customizable ways to enhance your ComfyUI workflows.
Here's an example of how the LM Studio nodes can be used in a ComfyUI workflow:
custom_nodes
directory:
cd /path/to/ComfyUI/custom_nodes
git clone https://github.com/mattjohnpowell/comfyui-lmstudio-nodes.git
Add the "LM Studio Image To Text" node to your ComfyUI workflow. Connect an image output to the "image" input of the node.
Add the "LM Studio Text Generation" node to your ComfyUI workflow.
model
parameter doesn't have to be changed from its default value, but it's useful to set it explicitly when sharing workflows to ensure consistency.ip_address
and port
if your LM Studio instance is running on a different machine or port.This extension is adapted from LM Studio code examples. Here's the original image-to-text example:
# Adapted from OpenAI's Vision example
from openai import OpenAI
import base64
import requests
# Point to the local server
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")
# Ask the user for a path on the filesystem:
path = input("Enter a local filepath to an image: ")
# Read the image and encode it to base64:
base64_image = ""
try:
image = open(path.replace("'", ""), "rb").read()
base64_image = base64.b64encode(image).decode("utf-8")
except:
print("Couldn't read the image. Make sure the path is correct and the file exists.")
exit()
completion = client.chat.completions.create(
model="moondream/moondream2-gguf",
messages=[
{
"role": "system",
"content": "This is a chat between a user and an assistant. The assistant is helping the user to describe an image.",
},
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
},
},
],
}
],
max_tokens=1000,
stream=True
)
for chunk in completion:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
If you encounter any issues:
debug
input to True.For further assistance, please open an issue on the GitHub repository.
This project is licensed under the MIT License - see the LICENSE file for details.