ComfyUI Extension: ComfyUI-LexTools

Authored by SOELexicon

Created 2 years ago

Updated 9 months ago

31 stars

ComfyUI-LexTools is a Python-based image processing and analysis toolkit that uses machine learning models for semantic image segmentation, image scoring, and image captioning.

README

ComfyUI-LexTools

ComfyUI-LexTools is a Python-based image processing and analysis toolkit that uses machine learning models for semantic image segmentation, image scoring, and image captioning. The toolkit includes three primary components:

ImageProcessingNode.py - Implements various image processing nodes such as:
- ImageAspectPadNode: Expands the image to meet a specific aspect ratio. This node is useful for maintaining the aspect ratio when processing images.
  - Inputs:
    - Required: image (IMAGE), aspect_ratio (RATIO), invert_ratio (BOOLEAN), feathering (INT), left_padding (INT), right_padding (INT), top_padding (INT), bottom_padding (INT)
    - Optional: show_on_node (INT)
  - Output: Expanded Image.
- ImageScaleToMin: Calculates the value needed to rescale an image's smallest dimension to 512. This node is useful for scaling images down to 512 or up to 512 for faster processing. It ensures that at least one dimension (width or height) is 512 pixels.
  - Input: image (IMAGE)
  - Output: Scale value.
- ImageRankingNode: Ranks the images based on specific criteria.
  - Input: score (INT), prompt (STRING), image_path (STRING), json_file_path (STRING)
  - Output: Ranked images.
- ImageFilterByIntScoreNode and ImageFilterByFloatScoreNode: Filter images based on a threshold score. Currently, these nodes may throw errors if the following node in the sequence does not handle blank outputs.
  - Input: score (INT for ImageFilterByIntScoreNode and FLOAT for ImageFilterByFloatScoreNode), threshold (FLOAT), image (IMAGE)
  - Output: Filtered images.
- ImageQualityScoreNode: Calculates a quality score for the image.
  - Input: aesthetic_score (INT), image_score_good (INT), image_score_bad (INT), ai_score_artificial (INT), ai_score_human (INT), weight_good_score (INT), weight_aesthetic_score (INT), weight_bad_score (INT), weight_AIDetection (INT), MultiplyScoreBy (INT), show_on_node (INT), weight_HumanDetection (INT)
  - Output: Quality score.
- ScoreConverterNode: Converts the score to different data types.
  - Input: score (SCORE)
  - Output: Converted score.
Additional nodes from GitHub Pages - These have been modified to improve performance and add an option to store the model in RAM, which significantly reduces generation time:
- CalculateAestheticScore: An optimized version of the original, with an option to keep the model loaded in RAM.
- AestheticScoreSorter: Sorts the images by score.
- AestheticModel: Loads the aesthetic model.
ImageCaptioningNode.py - Implements nodes for image captioning and classification:
- ImageCaptioningNode: Provides a caption for the image using BLIP model.
  - Input: image (IMAGE)
  - Output: String caption.
- FoodCategoryClassifierNode: Classifies food categories in images.
  - Input: image (IMAGE)
  - Output: Top 5 food categories with probabilities.
- AgeClassifierNode: Classifies the age range in images.
  - Input: image (IMAGE)
  - Output: Top 5 age ranges with probabilities.
- ArtOrHumanClassifierNode: Detects if an image is AI-generated or human-made.
  - Input: image (IMAGE), show_on_node (BOOL)
  - Output: Artificial and human probabilities.
- DocumentClassificationNode: Classifies document types.
  - Input: image (IMAGE)
  - Output: Document type index and name.
- NSFWClassifierNode: Classifies content safety levels.
  - Input: image (IMAGE), show_on_node (BOOL), threshold (FLOAT)
  - Output:
    - Classification report (STRING)
    - SFW Score (FLOAT)
    - NSFW Score (FLOAT)
    - Is SFW (BOOLEAN)
    - Is NSFW (BOOLEAN)
- WatermarkDetectionNode: Detects watermarks in images using EfficientNet.
  - Input: image (IMAGE), show_on_node (BOOL), threshold (FLOAT)
  - Output:
    - Classification report (STRING)
    - Clean Score (FLOAT)
    - Watermark Score (FLOAT)
    - Is Clean (BOOLEAN)
    - Has Watermark (BOOLEAN)
SegformerNode.py - Handles semantic segmentation of images:
- SegformerNode: Performs semantic segmentation with multiple model options.
  - Input: image (IMAGE), model_name (STRING), normalize_mask (BOOL), binary_mask (BOOL), resize_mode (STRING), invert_mask (BOOL), show_preview (BOOL), return_individual_masks (BOOL), post_process (STRING), post_process_radius (INT), segment_groups (STRING)
  - Output: Segmented image, mask, info, and preview.
- SegformerNodeMasks: Creates individual segment masks.
  - Input: image (IMAGE), segments_to_merge (STRING), model_name (STRING)
  - Output: Image, mask, and segment info.
- SegformerNodeMergeSegments: Merges and processes segments with advanced options.
  - Input: image (IMAGE), segments_to_merge_str (STRING), model_name (STRING), normalize_mask (BOOL), binary_mask (BOOL), resize_mode (STRING), invert_mask (BOOL), show_preview (BOOL), blur_radius (INT), dilation_radius (INT), intensity (FLOAT), ceiling (FLOAT)
  - Output: Processed image, mask, info, and preview.
- SeedIncrementerNode: Manages seed incrementation for workflows.
  - Input: seed (INT), IncrementAt (INT)
  - Output: Seed string, seed int, subseed string, subseed int.
- StepCfgIncrementNode: Handles step and configuration increments.
  - Input: seed (INT), cfg_start (INT), steps_start (INT), image_steps (INT), max_steps (INT)
  - Output: CFG and steps values.

Requirements

The project requires the following Python libraries:

torch
transformers
Pillow (PIL)
matplotlib
numpy
scipy
huggingface_hub
torchvision

Installation

Install the required Python packages:

pip install torch transformers pillow matplotlib numpy scipy huggingface_hub torchvision

Clone this repository into your ComfyUI custom_nodes directory:

cd ComfyUI/custom_nodes
git clone https://github.com/YourUsername/ComfyUI-LexTools.git

Restart ComfyUI to load the new nodes.

Usage

The nodes will appear in the ComfyUI interface under the "LexTools" category, organized into subcategories:

LexTools/ImageProcessing/Segmentation
LexTools/ImageProcessing/Classification
LexTools/ImageProcessing/Captioning
LexTools/Utilities

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License - see the LICENSE file for details.