ComfyUI Extension: comfyui_longcat_image
ComfyUI integration of the LongCat-Image pipeline for text-to-image generation and image editing with excellent Chinese text rendering capabilities. (Description by CC)
Custom Nodes (0)
README
ComfyUI LongCat-Image Integration
This custom node integrates the LongCat-Image pipeline into ComfyUI, enabling text-to-image generation and image editing with the LongCat-Image models.
Features
- Text-to-Image Generation: Generate high-quality images from text prompts using LongCat-Image models
- Image Editing: Edit existing images with instruction-based prompts using LongCat-Image-Edit models
- Chinese Text Support: Excellent Chinese text rendering capabilities
- Efficient: Only 6B parameters with competitive performance
Installation
1. Install Dependencies
cd custom_nodes/comfyui_longcat_image
pip install -r requirements.txt
2. Install LongCat-Image Package
pip install git+https://github.com/meituan-longcat/LongCat-Image.git
3. Download Models
Download the models using huggingface-cli:
pip install "huggingface_hub[cli]"
# For text-to-image
huggingface-cli download meituan-longcat/LongCat-Image --local-dir models/diffusion_models/LongCat-Image
# For image editing
huggingface-cli download meituan-longcat/LongCat-Image-Edit --local-dir models/diffusion_models/LongCat-Image-Edit
# For fine-tuning (optional)
huggingface-cli download meituan-longcat/LongCat-Image-Dev --local-dir models/diffusion_models/LongCat-Image-Dev
Available Nodes
LongCat-Image Model Loader
Loads a LongCat-Image model for use with other nodes.
Inputs:
model_path: Path to the model directory (e.g., "LongCat-Image" or "LongCat-Image-Edit")dtype: Data type for model weights (bfloat16, float16, float32)enable_cpu_offload: Enable CPU offload to save VRAM (false/true, default: false)
Outputs:
LONGCAT_PIPE: Pipeline object for use with generation nodes
Low VRAM Support
The model loader supports low VRAM mode via the enable_cpu_offload option:
- Disabled (default): All models loaded to GPU at once
- Faster inference
- Requires more VRAM (typically ~24GB+)
- Enabled: Models offloaded to CPU when not in use
- Slower inference (due to model transfers)
- Requires only ~17-19GB VRAM
- Prevents Out-of-Memory errors on lower-end GPUs
When to use CPU offload:
- GPUs with less than 24GB VRAM
- When experiencing OOM errors
- When running multiple models simultaneously
LongCat-Image Text to Image
Generates images from text prompts.
Inputs:
LONGCAT_PIPE: Pipeline from the model loaderprompt: Text description of the image to generatenegative_prompt: Things to avoid in the generated imagewidth: Image width (default: 1344)height: Image height (default: 768)steps: Number of inference steps (default: 50)guidance_scale: CFG scale (default: 4.5)seed: Random seedenable_cfg_renorm: Enable CFG renormalization (true/false)enable_prompt_rewrite: Enable built-in prompt rewriting (true/false)
Outputs:
IMAGE: Generated image
LongCat-Image Edit
Edits images based on instruction prompts.
Inputs:
LONGCAT_PIPE: Pipeline from the model loader (must be an edit model)image: Input image to editprompt: Edit instructionnegative_prompt: Things to avoid in the edited imagesteps: Number of inference steps (default: 50)guidance_scale: CFG scale (default: 4.5)seed: Random seed
Outputs:
IMAGE: Edited image
Example Workflows
Example workflow JSON files are provided in this directory:
example_workflow_t2i.json- Text-to-image generation workflowexample_workflow_edit.json- Image editing workflow
You can load these workflows in ComfyUI by dragging and dropping the JSON file onto the canvas.
Text-to-Image
-
Add a LongCat-Image Model Loader node
- Set
model_pathto "LongCat-Image"
- Set
-
Add a LongCat-Image Text to Image node
- Connect the loader output to the pipeline input
- Enter your prompt
- Adjust settings as needed
-
Add a Save Image node to save the output
Image Editing
-
Add a LongCat-Image Model Loader node
- Set
model_pathto "LongCat-Image-Edit"
- Set
-
Add a Load Image node to load your input image
-
Add a LongCat-Image Edit node
- Connect the loader output to the pipeline input
- Connect the image to edit
- Enter your edit instruction (e.g., "将猫变成狗" - "change the cat to a dog")
-
Add a Save Image node to save the output
Model Information
| Model | Type | Description | |-------|------|-------------| | LongCat-Image | Text-to-Image | Final release model for out-of-the-box inference | | LongCat-Image-Dev | Text-to-Image | Mid-training checkpoint, suitable for fine-tuning | | LongCat-Image-Edit | Image Editing | Specialized model for image editing |
Performance
- Parameters: 6B (highly efficient)
- Supported Resolutions: 768x1344 and variations
- Chinese Text Support: Industry-leading Chinese dictionary coverage
- Quality: Competitive with much larger models
VRAM Requirements
| Mode | VRAM Required | Speed | When to Use | |------|---------------|-------|-------------| | Standard (CPU offload disabled) | ~24GB+ | Faster | High-end GPUs (e.g., RTX 3090, 4090, A100) | | Low VRAM (CPU offload enabled) | ~17-19GB | Slower | Mid-range GPUs (e.g., RTX 3080, 4080) |
Note: The Low VRAM mode uses CPU offloading to transfer models between CPU and GPU as needed, reducing VRAM usage at the cost of slower inference speed.
Tips
- For better results, use a strong LLM for prompt engineering
- The model has excellent Chinese text rendering capabilities
- Enable prompt rewriting for enhanced generation quality
- Default guidance scale of 4.5 works well for most cases
License
LongCat-Image is licensed under Apache 2.0. See the LongCat-Image repository for more information.