ComfyUI Extension: ComfyUI_LocalLLMNodes
A custom node pack for ComfyUI that allows you to run Large Language Models (LLMs) locally and use them for prompt generation and other text tasks directly within your ComfyUI workflows.
Custom Nodes (0)
README
ComfyUI_LocalLLMNodes
A custom node pack for ComfyUI to run Large Language Models (LLMs) locally for flux kontext dev prompt generation within ComfyUI workflows.
Purpose: This pack is designed to simplify the creation of detailed, high-quality prompts for advanced image generation models like Flux Kontext Dev. It achieves this by using a locally running LLM (Large Language Model) to process:
- An English image description generated by nodes like Florence-2 or Janus-Pro.
- A simple, user-provided instruction (e.g., "Make it look like it's being used in a luxury spa"). The local LLM combines these inputs according to a selected preset to generate a single, complex prompt specifically tailored for Flux Kontext Dev.
Important Note: If the "Set Local GGUF LLM Service Connector 🐑" node does not appear in your node library after installation, ensure you have installed the llama-cpp-python
library. This library is required for the GGUF connector node to function and may need to be installed manually with GPU support flags for optimal performance (see Installation).
This pack provides nodes to connect to and utilize local LLMs (like Llama, Phi, Gemma, Hermes in Hugging Face PyTorch format, or GGUF models) without needing external API calls. It integrates with prompt generation workflows, such as those using Florence-2, to simplify creating prompts for models like Flux Kontext Dev.
Key Features & Included Nodes
- Local LLM Execution (Hugging Face & GGUF): Run powerful LLMs directly on your machine (CPU or GPU) using standard Hugging Face models or efficient GGUF models.
- Set Local LLM Service Connector Node (HuggingFace): Select and configure your local Hugging Face format LLM model.
- Set Local GGUF LLM Service Connector Node: Select and configure your local GGUF format LLM model file. Includes dropdown for device selection (CPU/GPU) and
n_gpu_layers
slider. - Local Kontext Prompt Generator Node: (Key Feature) Generates detailed image prompts by combining image descriptions and simple user instructions using a connected local LLM.
- User Preset Management: Add and remove custom prompt generation presets using dedicated nodes (
AddUserLocalKontextPreset
,RemoveUserLocalKontextPreset
). - VRAM Optimization Ready: Includes commented code examples for integrating quantization (4-bit/8-bit
bitsandbytes
for Hugging Face, orn_gpu_layers
for GGUF) to reduce memory footprint. - Simplified User Experience: Use simple requests (e.g., "make it look like it's being used in a luxury spa") which are translated into complex, Flux-ready prompts.
Installation
- Navigate to your ComfyUI installation directory.
- Go to the
custom_nodes
folder. - Clone this repository:
git clone https://github.com/your_username/ComfyUI_LocalLLMNodes.git
- Install Dependencies: You can install dependencies using
pip
and the provided scripts orrequirements.txt
.- Option 1: Using Installation Scripts (Recommended for GPU Support):
- Navigate to the
ComfyUI_LocalLLMNodes
directory:cd ComfyUI_LocalLLMNodes
- Linux/macOS:
chmod +x install_deps.sh ./install_deps.sh
- Windows (Command Prompt):
install_deps.bat
- Windows (PowerShell):
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser # If needed .\install_deps.ps1
- These scripts install core dependencies and
llama-cpp-python
with CUDA support.
- Navigate to the
- Option 2: Standard
pip install
:
Note: For GPU acceleration with GGUF models, use the installation scripts or follow the manual steps below.cd ComfyUI_LocalLLMNodes pip install -r requirements.txt
- Option 1: Using Installation Scripts (Recommended for GPU Support):
Installing llama-cpp-python
with GPU Support (CUDA) - Important for GGUF Nodes
To use your GPU for GGUF models via the Set Local GGUF LLM Service Connector 🐑
node, install llama-cpp-python
with CUDA support.
Manual Installation (if not using scripts):
- Ensure CUDA Toolkit is Installed: Match the version to your GPU drivers.
- Set Environment Variables and Install:
- Linux/macOS:
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --force-reinstall --no-cache-dir
- Windows (Command Prompt):
set CMAKE_ARGS=-DGGML_CUDA=on pip install llama-cpp-python --force-reinstall --no-cache-dir
- Windows (PowerShell):
$env:CMAKE_ARGS = "-DGGML_CUDA=on" pip install llama-cpp-python --force-reinstall --no-cache-dir
- Linux/macOS:
Usage
- Download a Local LLM:
- Hugging Face Format: Place model files in
ComfyUI/models/LLM/YourModelName/
. - GGUF Format: Place the
.gguf
file inComfyUI/models/LLM/
.
- Hugging Face Format: Place model files in
- Restart ComfyUI to load the new nodes.
- Find the Nodes: Look for the new nodes under the category
Local LLM Nodes/LLM Connectors
. - Use the Nodes:
- For Hugging Face Models:
- Add the "Set Local LLM Service Connector 🐑 (HuggingFace)" node.
- Select your model directory from the dropdown.
- For GGUF Models:
- Add the "Set Local GGUF LLM Service Connector 🐑" node.
- Select your model file (or directory) from the dropdown.
- Use the
device
dropdown: ChooseGPU
for GPU acceleration (requires CUDA setup). ChooseCPU
for CPU execution. - Adjust the
n_gpu_layers
slider to control how many layers are offloaded to the GPU.
- Common Steps for Prompt Generation:
- Add the "Local Kontext Prompt Generator 🐑" node.
- Connect the output of the chosen "Set Local ... LLM Service Connector 🐑" node to the
llm_service_connector
input. - Provide Inputs:
- Connect an image description (e.g., from Florence-2) to
image1_description
. - Provide a simple
edit_instruction
(e.g., "make it look like it's being used in a luxury spa"). - Select a
preset
from the dropdown.
- Connect an image description (e.g., from Florence-2) to
- Connect the
kontext_prompt
output to your desired node (e.g., Flux Kontext Dev).
- For Hugging Face Models:
Memory Optimization
Running large LLMs alongside large image models can strain resources.
- Hugging Face Models:
- Use 4-bit/8-bit quantization with
bitsandbytes
. Uncomment and adjust the code inlocal_llm_connector.py
.
- Use 4-bit/8-bit quantization with
- GGUF Models:
- GGUF models are inherently quantized. Choose a more quantized version (like Q8_0) for lower memory usage.
- Use the
n_gpu_layers
parameter in the GGUF connector node for GPU acceleration.
Requirements
- ComfyUI
- Python Libraries (see
requirements.txt
):transformers
torch
llama-cpp-python
(Optional, for GGUF support - install with GPU flags)bitsandbytes
(Optional, for Hugging Face quantization)- Other dependencies as listed
Acknowledgements
This node pack builds upon concepts from the excellent ComfyUI-MieNodes extension.