ComfyUI Extension: Comfyui-QwenEditUtils

Authored by lrzjason

Created

Updated

153 stars

A collection of utility nodes for Qwen-based image editing in ComfyUI.

Custom Nodes (0)

    README

    Comfyui-QwenEditUtils

    A collection of utility nodes for Qwen-based image editing in ComfyUI.

    Example

    <p align="center"> <img src="example.png" alt="Example Workflow" width="45%" /> <img src="result.png" alt="Result Image" width="45%" /> </p>

    You can find a complete ComfyUI workflow example in the qwen-edit-plus_example.json file. This workflow demonstrates how to use the TextEncodeQwenImageEditPlus node with two reference images to create an outfit transfer effect.

    Node

    TextEncodeQwenImageEditPlus 小志Jason(xiaozhijason)

    This node provides text encoding functionality with reference image support for Qwen-based image editing workflows. It allows you to encode prompts while incorporating up to 5 reference images for more controlled image generation.

    Inputs

    • clip: The CLIP model to use for encoding
    • prompt: The text prompt to encode
    • vae (optional): The VAE model for image encoding
    • image1 (optional): First reference image for image editing
    • image2 (optional): Second reference image for image editing
    • image3 (optional): Third reference image for image editing
    • image4 (optional): Fourth reference image for image editing
    • image5 (optional): Fifth reference image for image editing
    • enable_resize (optional): Enable automatic resizing of the reference image for VAE encoding
    • enable_vl_resize (optional): Enable automatic resizing of the reference image for VL encoding
    • llama_template (optional): Custom Llama template for image description and editing instructions

    Outputs

    • CONDITIONING: The encoded conditioning tensor
    • image1: The processed first reference image
    • image2: The processed second reference image
    • image3: The processed third reference image
    • image4: The processed fourth reference image
    • image5: The processed fifth reference image
    • LATENT: The encoded latent representation of the first reference image

    Behavior

    • Encodes text prompts using CLIP with optional reference image guidance
    • Supports up to 5 reference images for complex editing tasks
    • Automatically resizes reference images to optimal dimensions for both VAE and VL encoding
    • Integrates with VAE models to encode reference images into latent space
    • Supports custom Llama templates for more precise image editing instructions
    • Processes images separately for VAE encoding (1024x1024) and VL encoding (384x384)
    • Returns individual processed images for more flexible workflow connections

    Key Features

    • Multi-Image Support: Incorporate up to 5 reference images into your text-to-image generation workflow
    • Dual Resize Options: Separate resizing controls for VAE encoding (1024px) and VL encoding (384px)
    • Individual Image Outputs: Each processed reference image is provided as a separate output for flexible connections
    • Latent Space Integration: Encode reference images into latent space for efficient processing
    • Qwen Model Compatibility: Specifically designed for Qwen-based image editing models
    • Customizable Templates: Use custom Llama templates for tailored image editing instructions

    Installation

    1. Clone or download this repository into your ComfyUI's custom_nodes directory.
    2. Restart ComfyUI.
    3. The node will be available in the "advanced/conditioning" category.

    Usage

    1. Add the "TextEncodeQwenImageEditPlus 小志Jason(xiaozhijason)" node to your workflow.
    2. Connect a CLIP model to the clip input.
    3. Enter your text prompt in the prompt field.
    4. Optionally, connect up to 5 reference images to the image inputs.
    5. Configure the enable_resize, enable_vl_resize, and other options as needed.
    6. Connect the outputs to your image generation nodes:
      • Use the "conditioning" output for your sampler
      • Connect the individual image outputs (image1, image2, etc.) to nodes that need the processed reference images
      • Use the "latent" output for latent-based operations

    Update Log

    v1.0.5

    • Updated node to support separate enable_vl_resize parameter
    • Modified return types to provide 5 individual IMAGE outputs instead of a single combined output
    • Improved image processing logic with separate handling for VAE and VL encoding
    • Enhanced documentation to accurately reflect node inputs and outputs
    • Fixed latent output handling to properly return first reference image latent

    v1.0.1

    • Initial release with basic text encoding and single image reference support

    Contact

    Sponsors me for more open source projects:

    <div align="center"> <table> <tr> <td align="center"> <p>Buy me a coffee:</p> <img src="https://github.com/lrzjason/Comfyui-In-Context-Lora-Utils/blob/main/image/bmc_qr.png" alt="Buy Me a Coffee QR" width="200" /> </td> <td align="center"> <p>WeChat:</p> <img src="https://github.com/lrzjason/Comfyui-In-Context-Lora-Utils/blob/main/image/wechat.jpg" alt="WeChat QR" width="200" /> </td> </tr> </table> </div>