ComfyUI Extension: rafacostComfy

Authored by rafacost

Created 2 months ago

Updated 23 days ago

2 stars

A ComfyUI custom node for DreamOmni2 GGUF multimodal models — powered directly by llama-cpp-python.

Custom Nodes (0)

README

ComfyUI-DreamOmni2-GGUF

A ComfyUI custom node for DreamOmni2 GGUF multimodal models — powered directly by llama-cpp-python, no external executables required.

⚠️ Work in Progress — use at your own risk. <sub>(Or better: fork and help improve it.)</sub>

✨ Features

Run DreamOmni2 GGUF models natively inside ComfyUI.
Full image + text multimodal support through the llama-cpp-python backend.
Accepts up to four image inputs simultaneously.
Outputs either:
- conditioning embeddings (for generation workflows), or
- text descriptions (for captioning or analysis).
Built-in seeded cache system for deterministic results across sessions.
No dependency on external binaries or CLI tools.

🧩 Prerequisites

Requires Python ≥ 3.12 and a working ComfyUI ≥ 0.3.66.

Install dependencies (CPU or CUDA builds supported):

pip install -r requirements.txt

requirements.txt

torch>=2.2.0
numpy
Pillow
scikit-build-core
llama-cpp-python>=0.3.16

For CUDA acceleration, install the matching wheel from llama-cpp-python (CUDA builds).

Example (Windows CUDA 12.x):

pip install llama_cpp_python-0.3.16+cu124-cp312-cp312-win_amd64.whl

🧠 Installation

Clone this repository into your ComfyUI custom_nodes directory:
```
git clone https://github.com/rafacost/rafacost-comfy.git
```
Download the DreamOmni2 GGUF and mmproj models from Hugging Face.

Place them in:

ComfyUI/models/unet/
├── DreamOmni2-Vlm-Model-7.6B-Q5_K.gguf
├── DreamOmni2-Vlm-Model-7.6B-f16-mmproj.gguf
├── flux1-kontext-dev-Q5_K.gguf
ComfyUI/models/loras/
├── DreamOmni2-7.6B-Edit-Lora.safetensors
├── DreamOmni2-7.6B-Gen-Lora.safetensors

⚙️ Usage

Launch ComfyUI.
Add the DreamOmni2-VLM node under rafacostComfy / VLM.
Configure the node:
- Model – select your DreamOmni2 .gguf model.
- MMProj Path – select your vision projection .gguf.
- Images – connect up to four Load Image nodes.
- Prompt – enter a description or instruction.
- Seed – for deterministic results (cache linked).
- Use Cache – toggle to reuse previous generations.
- As Conditioning – output embeddings instead of raw text.
Optionally connect a DreamOmni2 Output Node to visualize the generated text inside the workflow.

🧪 Example

Use the sample workflow provided in workflows/ or connect images manually. Outputs appear both in the ComfyUI graph and in the terminal console.

⚙️ Troubleshooting

| Issue | Cause | Fix | | ------------------------------------------------ | -------------------------------------- | ------------------------------------------------ | | ImportError: cannot import Qwen25VLChatHandler | Missing or outdated llama-cpp-python | pip install --upgrade llama-cpp-python>=0.3.16 | | Model loads but crashes | Out-of-memory | Try lower quant (e.g. Q4_K) | | No output text | Prompt too short / token limit | Increase max_tokens | | Different outputs for same seed | Cache disabled or model reset | Enable Use Cache | | “NoneType object” errors | Missing mmproj | Verify both GGUF files are present | | Low Image Quality | Low Base Model GGUF quant | Try a higher GGUF quant for BaseModel (eg. use flux_kontext:Q8_0) | | Low Prompt Adherence | Low DreamOmni2 GGUF quant | Try a higher GGUF quant for DreamOmni2 (eg. use DreamOmni2-GGUF:Q8_0) | |Slow generation | CPU fallback. | Compile llama-cpp-python with CUDA for GPU acceleration.|

🗒️ Notes

Requires ComfyUI ≥ 0.3.66.
Tested on Python 3.12 and Windows 11.
GPU recommended for large GGUFs.
Original model licensing applies — this node only provides an interface.

📜 License

Refer to the original model and llama.cpp licenses. This node does not modify or supersede any upstream licensing.