Nodes: MS kosmos-2 Interrogator, Save Image w/o Metadata, Image Scale Bounding Box. An implementation of Microsoft a/kosmos-2 image to text transformer.
ComfyUI is the awesome stable diffusion GUI and backend.
Please note that this repository is currently a (learning) work in progress and might change anytime. It has been tested in Windows 10 only so far.
An implementation of Microsoft kosmos-2 text & image to text transformer .
This node takes a prompt that can influence the output, for example, if you put "Very detailed, an image of", it outputs more details than just "An image of". kosmos-2 is quite impressive, it recognizes famous people and written text in the image:
kosmos-2 output: An image of Donald Trump giving the peace sign with the words "Make America Great Again" written next to him.
At the first start, the kosmos-2 model files will be downloaded from huggingface. Please be patient. The model file is about 6GB in size. There is a cpu/gpu selector, but be aware that the model will eat up about 6GB of your precious VRAM in gpu mode!
Alternatively, the model can be downloadad manually. Place the files in a folder named "kosmos-2-patch14-224" under the ./ComfyUI/models/kosmos2 folder.
./ComfyUI/models/kosmos2/kosmos-2-patch14-224
must contain the following files:
added_tokens.json
config.json
generation_config.json
model.safetensors
preprocessor_config.json
sentencepiece.bpe.model
special_tokens_map.json
tokenizer.json
tokenizer_config.json
The kosmos2 base folder can also be configured in extra_model_paths.yaml
See example outputs and workflows
Also see Moondream, Recognize Anything Model
With this custom save image node, you can preview or save, include or exclude the ComfyUI workflow metadata in the image. It is a derivation of ComfyUI's built-in save image node. Note that you can always right click on the image to save, it will also include the workflow if activated.
This node scales an input image into a given box size, whereby the aspect ratio keeps retained. The image can also be padded to the full box size with an arbitrary color.
See example outputs and workflows
This node easy creates an inpainting version of any SD1.5 model on the fly. No need to have GB's of inpainting models laying on your drive. This is very useful for any kind of inpainting nodes like detailers. Make sure you have the original SD1.5 models from RunwayML in your models folder:
They are needed for the calculation.
Unzip or git clone this repository into ComfyUI/custom_nodes folder and restart ComfyUI.