Nodes to use Florence2 VLM for image tagging and captioning
MiaoshouAI Tagger for ComfyUI is an advanced image captioning tool based on the Microsoft Florence-2 Model Fine-tuned to perfection. This tool offers highly accurate and contextually relevant image tagging for your projects.
2024/11/05 v1.4 A new release to support Florence-2-base-PromptGen-v2.0 and Florence-2-large-PromptGen-v2.0</br> 2024/09/28 v1.31 fix the configuration error rated to this issue, try to delete your existing model from models\LLM folder and run again. It will automatically download the new configurations for you. Or you can download the model from the baidu drive folder.</br> 2024/09/07 v1.2 updated to support Florence-2-large-PromptGen-v1.5, a random prompt widget is added to Tagger node so that if you want to get a different prompt everytime, then just switch it to "always". <br> 2024/09/05 v1.1 updated to support Florence-2-base-PromptGen-v1.5, 2 new prompt mode is added; a new node for flux clip text encoder is added to add easy support for flux model clips.
While current taggers like WD14 perform reasonably well, they often produce errors that require manual correction. MiaoshouAI/Florence-2-base-PromptGen is fine-tuned on Microsoft's latest Florence2 model using a curated dataset from Civitai images and tags. This ensures that the tagging results are more aligned with the typical prompts used for generating images, enhancing accuracy and relevance.
ComfyUI has emerged as one of the most popular node-based tools for Stable Diffusion workers. It offers various nodes and models, such as LLava and Ollama Vision nodes, for generating image captions and passing them to text encoders. However, these vision models are not specifically trained for prompting and image tagging. By using MiaoshouAI Tagger, you can see a clear improvement in results.
Fine-tuned on selected high quality Civitai images and clean tags to produce highly accurate and contextually relevant tags. Node-Based System: Leverages the power of ComfyUI's node-based system to concatenate tagging nodes, combining description captioning and keyword tagging for optimal results.
Can be combined with other nodes, such as text encoding, to achieve excellent results for automatic image processing.
Provides the best results for image training captioning by using advanced tagging and description methods.
Clone this repository to 'ComfyUI/custom_nodes` folder.
Install the dependencies in requirements.txt, transformers version 4.38.0 minimum is required:
pip install -r requirements.txt
or if you use portable (run this in ComfyUI_windows_portable -folder):
python_embeded\python.exe -m pip install -r ComfyUI\custom_nodes\ComfyUI-Miaoshouai-Tagger\requirements.txt
Use as single image captioning Combine simple caption with tag caption and save to output files
(Save image and grag to ComfyUI to try)
Model should be automatically downloaded the first time when you use the node. In any case that didn't happen, you can manually download it.
MiaoshouAI/Florence-2-base-PromptGen-v1.5
The downloaded model will be placed underComfyUI/LLM
folder
If you want to use a new version of PromptGen, you can simply delete the model folder and relaunch the ComfyUI workflow. It will auto download the model for you.
For anyone who wants to use PromptGen model outside comfyui to batch tag their images, you can use this tag tool created by TTPlant. His program uses my model and works in a Windows enviroment. Access to the download link.