ComfyUI Extension: ComfyUI Ollama Model Manager
Custom nodes for managing a/Ollama models in ComfyUI workflows. Load and unload models on-demand to optimize memory usage in constrained environments.
Custom Nodes (0)
README
ComfyUI Ollama Model Manager
Custom nodes for managing Ollama models in ComfyUI workflows. Load and unload models on-demand to optimize memory usage in constrained environments.
Features
- š Auto-Fetch Models - Models load automatically when you connect nodes (no workflow execution needed!)
- š¬ Chat Completion - Full text generation with conversation history
- š Dynamic Dropdowns - Model list updates instantly via ComfyUI API
- šÆ Type-Safe Connections - Client config passed between nodes
- ā¬ļø Load/Unload Models - Control memory usage efficiently
- š Beautiful Logging - Colored console output with JSON file logs
- š¾ Model Caching - Per-endpoint caching for better performance
- ⨠No CORS Issues - Backend API proxy eliminates browser restrictions
Installation
Recommended: ComfyUI-Manager
- Install via ComfyUI-Manager
- Search for "Ollama Manager"
- Click Install
Manual Installation
cd ComfyUI/custom_nodes
git clone https://github.com/darth-veitcher/comfyui-ollama-model-manager
cd comfyui-ollama-model-manager
# Install dependencies (auto-detects uv or uses pip)
python install.py
# OR manually with uv (recommended)
uv pip install httpx loguru rich
# OR manually with pip
pip install httpx loguru rich
For portable ComfyUI installations:
# Windows Portable
ComfyUI\python_embeded\python.exe install.py
# Or manually
ComfyUI\python_embeded\python.exe -m pip install httpx loguru rich
šÆ Quick Start Guide
Step 1: Add Ollama Client
- Add an Ollama Client node to your workflow
- Set
endpointto your Ollama server URL- Default:
http://localhost:11434 - Or use your remote server URL
- Default:
Step 2: Add Model Selector
- Add an Ollama Model Selector node
- Connect the
clientoutput from Ollama Client to theclientinput - ⨠Models auto-fetch immediately! - No need to execute the workflow
- Select your desired model from the dropdown
Step 3: Load the Model
- Add an Ollama Load Model node
- Connect
clientfrom Model Selector - The model dropdown auto-populates with available models
- Set
keep_alive(default-1keeps it loaded) - Execute the workflow to load the model
Step 4: Generate Text with Chat
- Add an Ollama Chat Completion node
- Connect
clientfrom Model Selector (model auto-populates) - Enter your prompt in the
promptfield - (Optional) Add a
system_promptto control behavior - Execute to generate a response!
Example:
- prompt: "Write a haiku about programming"
- system_prompt: "You are a helpful assistant"
- response: Returns the generated text
- history: Returns the conversation (for multi-turn chat)
Step 5: Multi-Turn Conversations (Optional)
For conversations with memory:
- Connect the
historyoutput from one Chat Completion node - To the
historyinput of the next Chat Completion node - Each response remembers the previous messages
Step 6: Unload When Done (Optional)
- Add an Ollama Unload Model node
- Connect it after your processing
- This frees up memory
Nodes Reference
Core Nodes
| Node | Description | |------|-------------| | Ollama Client | Creates a reusable Ollama connection config | | Ollama Model Selector | Select model with auto-fetch on connection | | Ollama Load Model | Loads a model into Ollama's memory | | Ollama Chat Completion | Generate text with conversation history | | Ollama Unload Model | Unloads a model to free memory |
Debug/Utility Nodes
| Node | Description | |------|-------------| | Ollama Debug: History | Formats conversation history as readable text for inspection | | Ollama Debug: History Length | Returns the number of messages in conversation history |
Option Nodes (Composable Parameters)
| Node | Parameter | Range/Type | Default | Description |
|------|-----------|------------|---------|-------------|
| Temperature | temperature | 0.0-2.0 | 0.8 | Controls randomness (0=deterministic, 2=very random) |
| Seed | seed | INT | 42 | Random seed for reproducible generation |
| Max Tokens | max_tokens | 1-4096 | 128 | Maximum tokens to generate |
| Top P | top_p | 0.0-1.0 | 0.9 | Nucleus sampling threshold |
| Top K | top_k | 1-100 | 40 | Top-k sampling (Ollama-specific) |
| Repeat Penalty | repeat_penalty | 0.0-2.0 | 1.1 | Penalty for repetition (Ollama-specific) |
| Extra Body | extra_body | JSON | {} | Advanced parameters (num_ctx, num_gpu, etc.) |
Advanced Usage
The architecture provides a clean, composable workflow:
[Ollama Client] ā [Model Selector] ā [Load Model] ā [Chat Completion] ā [Unload Model]
ā ā ā ā
(endpoint) (pick model, (load with) (generate text,
auto-refresh) keep_alive) track history)
Key Benefits:
- Reusable Client: Create one client, connect to multiple nodes
- Auto-refresh: Model Selector can refresh the list automatically
- Type Safety: Client connection passed between nodes
- Cleaner Workflows: Less redundant endpoint configuration
- Dynamic Dropdowns: Model list automatically populates after refresh
- Conversation Memory: History passed between chat nodes for multi-turn conversations
Example Workflow: Simple Chat
1. Ollama Client (endpoint: http://localhost:11434)
ā
2. Model Selector (model: "llama3.2", refresh: true)
ā
3. Load Model (keep_alive: "-1")
ā
4. Chat Completion (prompt: "Hello!")
ā
5. Unload Model
Example Workflow: Multi-Turn Conversation
1. [Client] ā [Selector] ā [Load] ā [Chat 1: "My name is Alice"]
ā (history)
[Chat 2: "What's my name?"]
ā (history)
[Chat 3: "Tell me a joke"]
ā
2. Unload Model
Example Workflow: Chat with Options
[Client] ā [Selector] ā [Load Model]
ā
āāāāāāāāāāāāāāāāāāāāā“āāāāāāāāāāāāāāāāāāāāā
ā ā ā
[Temperature=0.7] [Seed=42] [MaxTokens=200]
āāāāāāāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāā
ā (merged options)
[Chat Completion]
ā
"Deterministic response"
Example Workflow: Advanced Parameters
[Temperature=0.8] ā [TopK=50] ā [RepeatPenalty=1.2] ā [ExtraBody]
ā
{"num_ctx": 4096}
ā
[Chat Completion]
This pattern optimizes memory by unloading models when not needed, while maintaining full conversation context and precise control over generation parameters.
Configuration
Ollama Endpoint
Default: http://localhost:11434
Override by specifying a different endpoint in the "Refresh Model List" or "Load/Unload" nodes.
Keep Alive
Control how long models stay in memory:
-1(default): Keep loaded indefinitely5m: Keep for 5 minutes1h: Keep for 1 hour0: Unload immediately
Chat Parameters
The Ollama Chat Completion node supports:
Required:
client- Ollama client connectionmodel- Model name (auto-populated from selector)prompt- User message/question
Optional:
system_prompt- Instructions to guide model behaviorhistory- Previous conversation (for multi-turn chat)options- Generation parameters (temperature, seed, etc.)format- Output format: "none" (default, text) or "json" (structured JSON)image- Image input for vision models
Outputs:
response- Generated texthistory- Updated conversation (connect to next chat node)
Caching & Performance:
The chat node intelligently caches results to avoid unnecessary LLM calls:
- With Seed: When you provide a seed via the
OllamaOptionSeednode, identical inputs will be cached (like standard ComfyUI nodes). This prevents wasteful re-execution when re-running the same workflow. - Without Seed: When no seed is provided, the node will always re-execute to generate fresh, non-deterministic responses.
Example: Deterministic workflow with caching
[Seed=42] ā [Chat Completion] ā Output
ā
(Cached on re-run!)
This matches ComfyUI's standard behavior and significantly reduces API costs when iterating on workflows.
JSON Mode (Phase 3)
The format parameter enables structured output for workflows that need parseable data:
Example: Extract structured data
[Chat Completion]
āāā format: "json"
āāā prompt: "Extract person data: 'Alice is 30 years old'"
āāā system_prompt: "Return JSON with keys: name, age"
Output: {"name": "Alice", "age": 30}
When to use JSON mode:
- Data extraction workflows
- Structured output for downstream processing
- API integrations requiring JSON
- ComfyUI workflows that parse the response
Note: Set format to "json" to enable. The model will ensure valid JSON output.
Debug Utilities (Phase 3)
Ollama Debug: History - Inspect conversation memory
[Chat History] ā [Debug: History]
ā
Formatted Text Output:
=== Conversation History (3 messages) ===
[1] SYSTEM:
You are helpful
[2] USER:
Hello
[3] ASSISTANT:
Hi there!
Ollama Debug: History Length - Count messages
[Chat History] ā [History Length] ā Output: 5 (messages)
Use cases:
- Debugging conversation flow
- Monitoring context length
- Workflow conditional logic based on message count
- Understanding what the model "remembers"
Logging
Logs are written to:
- Console: Colored output with timestamps
- File:
logs/ollama_manager.json(14-day retention, compressed)
Example log output:
08:36:30 | INFO | refresh-abc123 | š Refreshing model list from http://localhost:11434
08:36:30 | INFO | refresh-abc123 | ā
Model list refreshed: 3 models available
08:36:31 | INFO | load-def456 | ā¬ļø Loading model 'llava:latest' (keep_alive=-1)
08:36:32 | INFO | load-def456 | ā
Model 'llava:latest' loaded successfully
Requirements
- Python ā„3.12
- httpx ā„0.28.1
- loguru ā„0.7.3
- rich ā„14.2.0
- Ollama running locally or remotely
Development
Project Structure
comfyui-ollama-model-manager/
āāā __init__.py # ComfyUI entry point
āāā install.py # Dependency installer (uv/pip auto-detect)
āāā pyproject.toml # Package metadata & dependencies
āāā src/
ā āāā comfyui_ollama_model_manager/
ā āāā __init__.py # Package init
ā āāā nodes.py # Model management nodes
ā āāā chat.py # Chat completion node
ā āāā types.py # Custom type definitions
ā āāā ollama_client.py # API client (fetch, load, unload, chat)
ā āāā api.py # ComfyUI API routes
ā āāā state.py # Model cache
ā āāā log_config.py # Logging setup
ā āāā async_utils.py # Async utilities
āāā tests/ # Pytest test suite (52 tests)
āāā web/
āāā ollama_widgets.js # Auto-fetch UI logic
Running Tests
# With uv (recommended)
uv run pytest
# Or with pip
pip install pytest pytest-asyncio
pytest
Troubleshooting
Nodes don't appear in ComfyUI
- Check that dependencies are installed:
pip list | grep -E "httpx|loguru|rich" - Restart ComfyUI completely
- Check ComfyUI console for error messages
- Verify Ollama is running:
curl http://localhost:11434/api/tags
Import errors
If you see ModuleNotFoundError, install dependencies manually:
pip install httpx loguru rich
Permission errors (Windows)
Close ComfyUI and run:
ComfyUI\python_embeded\python.exe -m pip install --upgrade httpx loguru rich
License
[Add your license here]