ComfyUI Extension: comfyui_pilcothink_VisionSLM

Authored by gpdev-Pilcothink

Created 2 months ago

Updated 18 days ago

1 stars

Custom ComfyUI nodes to run SLM Vision models (DeepSeek-vl 1.3b chat, Qwen2.5-vl 3b, Gemma-3-4b-it) with optional RAG support.

Custom Nodes (0)

README

comfyUI_pilcothink_VisionSLM

Custom ComfyUI nodes to run SLM Vision models (DeepSeek-vl 1.3b chat, Qwen2-vl-2b-Instruct, Qwen2.5-vl 3b, qwen3-vl(2b,4b)[Instruct, Thinking] , Gemma-3-4b-it) with optional RAG support. + Support SLM Model (Qwen3-0.6b)

Models are downloaded into Models/SLM_Vision/ when selected in the node.

LICENSE

-utils/backends/DeepSeek-vl https://github.com/deepseek-ai/DeepSeek-VL, MIT LICENSE

-utils/backends/qwen_vl_utils https://github.com/QwenLM/Qwen3-VL, Apache-2.0 license

Tips

(1) If you choose the CPU option on the device, it will only work with float16. Other dtypes will result in errors.

(2) You can enable RAG functionality by placing your data in the rag_doc folder in .txt format.

(3) Since gemma-3-4b-it cannot be accessed on Hugging Face without logging in, you will need to configure it separately, or alternatively, download it directly from the repository and place it in the Models folder.

(4) If you want to unload the model from memory, simply switch off the 'Use Cache' option and execute the model again.

#Thank you Please share any suggestions for improvements in the Issues section.