ComfyUI Node: Qwen3 VQA
Category
Comfyui_Qwen3-VL-Instruct
Inputs
text STRING
model
- Qwen3-VL-4B-Instruct-FP8
- Qwen3-VL-4B-Thinking-FP8
- Qwen3-VL-8B-Instruct-FP8
- Qwen3-VL-8B-Thinking-FP8
- Qwen3-VL-4B-Instruct
- Qwen3-VL-4B-Thinking
- Qwen3-VL-8B-Instruct
- Qwen3-VL-8B-Thinking
quantization
- none
- 4bit
- 8bit
keep_model_loaded BOOLEAN
temperature FLOAT
max_new_tokens INT
min_pixels INT
max_pixels INT
seed INT
attention
- eager
- sdpa
- flash_attention_2
source_path PATH
image IMAGE
Outputs
STRING
Extension: Comfyui_Qwen3-VL-Instruct
This is an implementation of Qwen3-VL-Instruct by ComfyUI, which includes, but is not limited to, support for text-based queries, video queries, single-image queries, and multi-image queries to generate captions or responses.
Authored by IuvenisSapiens
Run ComfyUI workflows in the Cloud!
No downloads or installs are required. Pay only for active GPU usage, not idle time. No complex setups and dependency issues
Learn more