ComfyUI Extension: ComfyUI_LocalLLMNodes

Authored by kmlbdh

Created 4 months ago

Updated 4 months ago

6 stars

A custom node pack for ComfyUI that allows you to run Large Language Models (LLMs) locally and use them for prompt generation and other text tasks directly within your ComfyUI workflows.

Custom Nodes (0)

README

ComfyUI_LocalLLMNodes

A custom node pack for ComfyUI to run Large Language Models (LLMs) locally for flux kontext dev prompt generation within ComfyUI workflows.

Purpose: This pack is designed to simplify the creation of detailed, high-quality prompts for advanced image generation models like Flux Kontext Dev. It achieves this by using a locally running LLM (Large Language Model) to process:

An English image description generated by nodes like Florence-2 or Janus-Pro.
A simple, user-provided instruction (e.g., "Make it look like it's being used in a luxury spa"). The local LLM combines these inputs according to a selected preset to generate a single, complex prompt specifically tailored for Flux Kontext Dev.

Important Note: If the "Set Local GGUF LLM Service Connector 🐑" node does not appear in your node library after installation, ensure you have installed the llama-cpp-python library. This library is required for the GGUF connector node to function and may need to be installed manually with GPU support flags for optimal performance (see Installation).

This pack provides nodes to connect to and utilize local LLMs (like Llama, Phi, Gemma, Hermes in Hugging Face PyTorch format, or GGUF models) without needing external API calls. It integrates with prompt generation workflows, such as those using Florence-2, to simplify creating prompts for models like Flux Kontext Dev.

Key Features & Included Nodes

Local LLM Execution (Hugging Face & GGUF): Run powerful LLMs directly on your machine (CPU or GPU) using standard Hugging Face models or efficient GGUF models.
Set Local LLM Service Connector Node (HuggingFace): Select and configure your local Hugging Face format LLM model.
Set Local GGUF LLM Service Connector Node: Select and configure your local GGUF format LLM model file. Includes dropdown for device selection (CPU/GPU) and n_gpu_layers slider.
Local Kontext Prompt Generator Node: (Key Feature) Generates detailed image prompts by combining image descriptions and simple user instructions using a connected local LLM.
User Preset Management: Add and remove custom prompt generation presets using dedicated nodes (AddUserLocalKontextPreset, RemoveUserLocalKontextPreset).
VRAM Optimization Ready: Includes commented code examples for integrating quantization (4-bit/8-bit bitsandbytes for Hugging Face, or n_gpu_layers for GGUF) to reduce memory footprint.
Simplified User Experience: Use simple requests (e.g., "make it look like it's being used in a luxury spa") which are translated into complex, Flux-ready prompts.

Installation

Navigate to your ComfyUI installation directory.
Go to the custom_nodes folder.

Clone this repository:

git clone https://github.com/your_username/ComfyUI_LocalLLMNodes.git

Install Dependencies: You can install dependencies using pip and the provided scripts or requirements.txt.
- Option 1: Using Installation Scripts (Recommended for GPU Support):
  - Navigate to the ComfyUI_LocalLLMNodes directory:
```
cd ComfyUI_LocalLLMNodes
```
  - Linux/macOS:
```
chmod +x install_deps.sh
./install_deps.sh
```
  - Windows (Command Prompt):
```
install_deps.bat
```
  - Windows (PowerShell):
```
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser # If needed
.\install_deps.ps1
```
  - These scripts install core dependencies and llama-cpp-python with CUDA support.
- Option 2: Standard pip install:
```
cd ComfyUI_LocalLLMNodes
pip install -r requirements.txt
```
  Note: For GPU acceleration with GGUF models, use the installation scripts or follow the manual steps below.

Installing `llama-cpp-python` with GPU Support (CUDA) - Important for GGUF Nodes

To use your GPU for GGUF models via the Set Local GGUF LLM Service Connector 🐑 node, install llama-cpp-python with CUDA support.

Manual Installation (if not using scripts):

Ensure CUDA Toolkit is Installed: Match the version to your GPU drivers.

Set Environment Variables and Install:

Linux/macOS:

CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --force-reinstall --no-cache-dir

Windows (Command Prompt):

set CMAKE_ARGS=-DGGML_CUDA=on
pip install llama-cpp-python --force-reinstall --no-cache-dir

Windows (PowerShell):

$env:CMAKE_ARGS = "-DGGML_CUDA=on"
pip install llama-cpp-python --force-reinstall --no-cache-dir

Usage

Download a Local LLM:
- Hugging Face Format: Place model files in ComfyUI/models/LLM/YourModelName/.
- GGUF Format: Place the .gguf file in ComfyUI/models/LLM/.
Restart ComfyUI to load the new nodes.
Find the Nodes: Look for the new nodes under the category Local LLM Nodes/LLM Connectors.
Use the Nodes:
- For Hugging Face Models:
  - Add the "Set Local LLM Service Connector 🐑 (HuggingFace)" node.
  - Select your model directory from the dropdown.
- For GGUF Models:
  - Add the "Set Local GGUF LLM Service Connector 🐑" node.
  - Select your model file (or directory) from the dropdown.
  - Use the device dropdown: Choose GPU for GPU acceleration (requires CUDA setup). Choose CPU for CPU execution.
  - Adjust the n_gpu_layers slider to control how many layers are offloaded to the GPU.
- Common Steps for Prompt Generation:
  - Add the "Local Kontext Prompt Generator 🐑" node.
  - Connect the output of the chosen "Set Local ... LLM Service Connector 🐑" node to the llm_service_connector input.
  - Provide Inputs:
    - Connect an image description (e.g., from Florence-2) to image1_description.
    - Provide a simple edit_instruction (e.g., "make it look like it's being used in a luxury spa").
    - Select a preset from the dropdown.
  - Connect the kontext_prompt output to your desired node (e.g., Flux Kontext Dev).

Memory Optimization

Running large LLMs alongside large image models can strain resources.

Hugging Face Models:
- Use 4-bit/8-bit quantization with bitsandbytes. Uncomment and adjust the code in local_llm_connector.py.
GGUF Models:
- GGUF models are inherently quantized. Choose a more quantized version (like Q8_0) for lower memory usage.
- Use the n_gpu_layers parameter in the GGUF connector node for GPU acceleration.

Requirements

ComfyUI
Python Libraries (see requirements.txt):
- transformers
- torch
- llama-cpp-python (Optional, for GGUF support - install with GPU flags)
- bitsandbytes (Optional, for Hugging Face quantization)
- Other dependencies as listed

Acknowledgements

This node pack builds upon concepts from the excellent ComfyUI-MieNodes extension.