ComfyUI Extension: ComfyUI_QwenVL_PromptCaption
Leverages Qwen 2.5/3 VL for prompt inversion & caption generation.
Custom Nodes (0)
README
ComfyUI_QwenVL_PromptCaption
Leverages Qwen 2.5/3 VL for prompt inversion & caption generation in ComfyUI
重要说明 | Important Note
❌ 插件不自动下载模型,可复用 ComfyOrg 提供的 qwen_2.5_vl_7b.safetensors,也可手动下载其它Qwen VL模型。
❌ This plugin does not auto-download models. It can reuse qwen_2.5_vl_7b.safetensors provided by ComfyOrg, or manually download other Qwen VL models.
节点 | Nodes
- Qwen 2.5 VL Caption: Single image prompt inversion
Qwen 2.5 VL Caption:单图提示词反推 - Qwen 2.5 VL Batch Caption: Batch image prompt inversion (folder input)
Qwen 2.5 VL Batch Caption:目录批量图片提示词反推 - Qwen 3 VL Caption: Single image prompt inversion
Qwen 3 VL Caption:单图提示词反推 - Qwen 3 VL Batch Caption: Batch image prompt inversion (folder input)
Qwen 3 VL Batch Caption:目录批量图片提示词反推 <img width="1294" height="875" alt="nodes1" src="https://github.com/user-attachments/assets/be0e7a0d-906e-4630-b920-72fc7dfe598f" />
安装方法 | Installation
a. Via ComfyUI Manager (coming soon)
通过 ComfyUI Manager 安装(即将支持)
b. Manual install:
手动安装:
- Copy the plugin folder to
ComfyUI/custom_nodes/
复制插件目录至ComfyUI/custom_nodes/ - Update dependency:
transformers>=4.57.0
更新依赖:transformers>=4.57.0
使用方法 | Usage
- Download the model
下载模型 - Edit prompt templates (optional)
编辑指令提示词(可选) - Adjust node inputs
调整节点输入参数 - Click "Run"
点击运行
模型说明 | Model Notes
- 模型读取路径:ComfyUI 的
text_encoders目录(需手动放置已下载模型)。
Model path: ComfyUI'stext_encodersfolder (place downloaded models manually).
复用 ComfyOrg 模型 | Reuse ComfyOrg Model
To reuse qwen_2.5_vl_7b.safetensors:
复用 qwen_2.5_vl_7b.safetensors 步骤:
- Create a FOLDER in ComfyUI/models/text_encoders
在ComfyUI/models/text_encoders中创建一个文件夹 - Rename the model file to
model.safetensorsand move it into the FOLDER
将模型文件重命名为model.safetensors并移入创建的文件夹 - Add required config files (from Qwen 2.5 VL's official Hugging Face repo)
添加必要配置文件(取自 Qwen 2.5 VL 官方 Hugging Face 仓库) https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct <img width="834" height="345" alt="nodes2" src="https://github.com/user-attachments/assets/80f9f42c-a71e-45ca-9b88-9c9c5567508c" />
✅ No extra disk usage – model remains usable for ComfyUI's Qwen Image/Edit model.
✅ 无额外硬盘消耗,不影响原模型用于 ComfyUI 的 Qwen Image/Edit模型。
直接下载官方模型 | Direct Download
Download Qwen 2.5/3 VL official repo from Hugging Face, then place it in text_encoders.
从 Hugging Face 下载 Qwen 2.5/3 VL 官方仓库,直接放入 text_encoders 目录即可。
https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct
https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct
https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct
国内也可从网盘下载:https://pan.quark.cn/s/b3975e789c3c
自定义提示词 | Custom Prompts
Edit prompts.txt in the custom_nodes folder (follow the existing format):
修改插件目录下的 prompts.txt 文件(参考原有格式):
- Support multiple prompts
支持多条提示词 - The nodes will use the last prompt matching the language
自动读取对应语言的最后一条提示词
模型精度建议 | VRAM & Precision Recommendations
| 显存 (VRAM) | 推荐精度 (Recommended Precision) | |-------------|----------------------------------| | 6-8GB | Qwen 2.5 VL 7B (4bit) / Qwen 3 VL 8B (4bit) / Qwen 3 VL 4B (8bit) | | 10-16GB | Qwen 2.5 VL 7B (8bit) / Qwen 3 VL 8B (8bit) / Qwen 3 VL 4B (bf16) | | 16GB+ | bf16 (full precision) |
参数说明 | Parameter Notes
keep_model_loaded
- Use True to Keep model in VRAM for consecutive prompt inversion tasks
连续进行提示词反推时选 True - False won't impact performance during batch node run
批量节点选 False仅在全部图片处理完成后清理模型,不影响过程性能
max_side
- Pre-scales the image's longer side to this size
预缩放图片长边尺寸 - Larger values may reduce processing speed
设置过大会导致速度下降
save_path
- will use image_path to save output if save_path not set
save_path为空时会使用image_path保存输出