Describe a single image or all images in a directory using models such as Janus Pro, Florence2 (coming soon), or JoyCaption (coming soon), with a particular focus on building datasets for training LoRA.
ComfyUI-CaptionThis is a flexible tool for generating image captions, supporting several powerful captioning models such as Janus Pro and Florence2, with plans to integrate more models like JoyCaption and other future developments. This tool aims to simplify workflows for image-to-image tasks and LoRA dataset preparation or similar fine-tuning processes, providing an intuitive way to describe individual images or batch process entire directories.
A ShowText or DisplayText node is required to display the results of command execution. However, ComfyUI currently does not provide a native node for this purpose. In my example, I used the self-implemented MieNodes (GitHub Repo), which I highly recommend. It’s easy to install, has no dependencies, and includes many caption-related file operation nodes.
Before:
After:
The process is the same as Janus Pro.
Currently, all models are downloaded directly from HuggingFace (or via hf_mirror
if you specify the environment variable HF_ENDPOINT=https://hf-mirror.com
). Alternatively, you can manually download the models and place them in the directories outlined below.
ComfyUI/models/Janus-Pro/
as follows:
ComfyUI/models/Janus-Pro/Janus-Pro-1B/
ComfyUI/models/Janus-Pro/Janus-Pro-7B/
ComfyUI/models/LLM/
as follows:
ComfyUI/models/LLM/Florence-2-base/
ComfyUI/models/LLM/Florence-2-base-ft/
ComfyUI/models/LLM/Florence-2-large/
ComfyUI/models/LLM/Florence-2-large-ft/
ComfyUI/models/LLM/Florence-2-base-PromptGen-v1.5/
ComfyUI/models/LLM/Florence-2-large-PromptGen-v1.5/
ComfyUI/models/LLM/Florence-2-base-PromptGen-v2.0/
ComfyUI/models/LLM/Florence-2-large-PromptGen-v2.0/
Single Image Description Generate detailed captions for an individual image using your chosen model. Users can upload an image and optionally provide specific prompts or guiding questions to enrich the output.
Batch Caption Generation
Automatically generate captions for multiple images within a specified directory. Each image will have its corresponding description saved as a .txt
file, streamlining the process of dataset preparation.
Multi-Model Support The system is designed to support multiple captioning models, giving users the flexibility to choose based on their specific tasks. Currently, the tool supports Janus Pro and Florence2, with plans for future updates to include additional models and expand functionality further.
Special thanks go to:
Building upon these contributions, this project introduces a refined multi-model architecture, empowering users to select the most appropriate model based on their specific needs.