ComfyUI Extension: comfyui_pilcothink_VisionSLM

Authored by gpdev-Pilcothink

Created

Updated

1 stars

Custom ComfyUI nodes to run SLM Vision models (DeepSeek-vl 1.3b chat, Qwen2.5-vl 3b, Gemma-3-4b-it) with optional RAG support.

Custom Nodes (0)

    README

    comfyUI_pilcothink_VisionSLM

    Custom ComfyUI nodes to run SLM Vision models (DeepSeek-vl 1.3b chat, Qwen2-vl-2b-Instruct, Qwen2.5-vl 3b, qwen3-vl(2b,4b)[Instruct, Thinking] , Gemma-3-4b-it) with optional RAG support. + Support SLM Model (Qwen3-0.6b)

    <img width="2596" height="1712" alt="image" src="https://github.com/user-attachments/assets/74130f20-7717-4a9a-a8c8-91776bcbed59" /> <img width="3323" height="1048" alt="image" src="https://github.com/user-attachments/assets/34375235-325c-449a-8396-eec99ed19673" />
    • Models are downloaded into Models/SLM_Vision/ when selected in the node.

    LICENSE

    -utils/backends/DeepSeek-vl https://github.com/deepseek-ai/DeepSeek-VL, MIT LICENSE

    -utils/backends/qwen_vl_utils https://github.com/QwenLM/Qwen3-VL, Apache-2.0 license

    Tips

    (1) If you choose the CPU option on the device, it will only work with float16. Other dtypes will result in errors.

    (2) You can enable RAG functionality by placing your data in the rag_doc folder in .txt format.

    (3) Since gemma-3-4b-it cannot be accessed on Hugging Face without logging in, you will need to configure it separately, or alternatively, download it directly from the repository and place it in the Models folder.

    (4) If you want to unload the model from memory, simply switch off the 'Use Cache' option and execute the model again.

    #Thank you Please share any suggestions for improvements in the Issues section.