ComfyUI Extension: ComfyUI-LongCatPlugin
ComfyUI nodes wrapping LongCat image generation and editing pipelines with text-to-image and multi-image edit flows using diffusers framework. (Description by CC)
Custom Nodes (0)
README
ComfyUI-LongCatPlugin
Third-party implementation of LongCat for ComfyUI.
Overview
ComfyUI nodes wrapping LongCat image generation and editing pipelines (diffusers-based). Includes text-to-image and multi-image edit flows with prompt rewriting, aspect-aware sizing, and latent decoding for ComfyUI.
Features
- LongCatPipelineLoader: Loads the LongCat pipeline and exposes Model/CLIP/VAE components.
- TextEncodeLongCatImage / TextEncodeLongCatImageEdit: Encodes text prompts and reference images (for editing) into conditioning.
- LongCatSizePicker: Selects the target supported resolutions and produces empty latents of the correct size.
- LongCatImageResizer: Resize input images to the nearest LongCat supported resolution using several strategies.
- LongCatSampler: Handles the sampling process (denoising) using the LongCat transformer.
- LongCatImageSizeScale: Scales images to a target pixel area and rounds dimensions to multiples of 16.
Implemented (What works today)
- LongCat transformer:
LongCatImageTransformer2DModelimplemented and used by the pipelines. - Pipelines:
LongCatImagePipeline(text-to-image): prompt rewrite, tokenizer usage, latent packing, denoising loop, and VAE decode implemented.LongCatImageEditPipeline(image editing): implemented with VL prompt handling, image latents, and edit-aware denoising.
- Nodes (ComfyUI):
LongCatPipelineLoader(basic loader for pipelines; returns (model, pipeline, vae) so other nodes can reference them).TextEncodeLongCatImage(T2I text encoding; uses CLIP tokenizer/encoder)TextEncodeLongCatImageEdit(Edit text & image encoding — partially implemented; some behavior remains to be fully wired to CLIP/VL models)LongCatSizePicker(picks supported resolutions and returns empty latents)LongCatImageResizer(resize/pad/crop strategies to fit LongCat resolutions)LongCatSampler(wraps ComfyUI-supported sampler calls into the LongCat denoising loop)LongCatImageSizeScale(image scaling node)
- Utilities:
longcat_image/utils/model_utils.pyprovides functions likepack_latents/unpack_latents,prepare_pos_ids,split_quotation,retrieve_timesteps, and optimized scaling helpers. - Training examples: scripts and example configs for LoRA, SFT, Edit, and DPO training under
train_examples/. - Tests: Minimal unit tests for nodes (
tests/test_nodes.py) for CI / smoke check.
Installation
- Copy Folder: Copy the
comfyui_longcatfolder into your ComfyUIcustom_nodesdirectory.- Example:
ComfyUI/custom_nodes/comfyui_longcat
- Example:
- Install Dependencies: Ensure your ComfyUI environment has the required packages:
(Note:pip install -r requirements.txtrequirements.txtis in the root of this repo, or checksetup.pyfor list). - Model Weights: Place LongCat model weights in a directory accessible to ComfyUI.
Usage in ComfyUI
- Put this plugin folder into ComfyUI
custom_nodes, then restart. - Nodes appear under category
LongCat. - Workflow:
- Load the model/pipeline with
LongCatPipelineLoader(returnsmodel,clip(pipeline), andvae). - Use
TextEncodeLongCatImagefor text-to-image orTextEncodeLongCatImageEditwhen editing or using reference images; connect theclip/pipeline node to these encoders. - Use
LongCatSizePickerto pick a supported resolution and create an empty latent (or use your own latents). - Use
LongCatImageResizerto resize reference images to the nearest supported LongCat resolution (if needed). - Connect the
MODEL,CONDITIONING(Positive/Negative), andLATENTtoLongCatSamplerto run the LongCat denoising process.
- Load the model/pipeline with
- Set
model_pathto your downloaded LongCat checkpoint; on CUDA you may enablecpu_offloadto save VRAM.
Development
- Tests:
pytest tests/test_nodes.py - Key modules:
nodes.py,longcat_image/pipelines/*,longcat_image/models/longcat_image_dit.py.
To-Do
Completed / in-progress items
- Implemented core transformer and pipelines for text-to-image and edit flows.
- Node wrappers for basic operation and VAE/text encoding/decoding.
- Training example scripts for LoRA, SFT, DPO, and Edit training flows.
Planned / Roadmap
- [ ] Finalize
LongCatPipelineLoaderto robustly load:- Diffusers-style directory checkpoints (separate transformer, vae, tokenizer, scheduler subfolders)
- Single-file safetensors checkpoints and mapping to model components
- CLIP and VAE loading and correct device placement
- [ ] Implement
TextEncodeLongCatImageEditfully (support multi-image & VL inputs and correct token/image merging). - [ ] Add documentation: step-by-step ComfyUI usage, example flow screenshots, and a model download/prepare helper script.
- [ ] Add automated pipeline tests, including smoke tests and tests across dtype/device combos.
- [ ] Add Git LFS support for model weights and an example of safe weight handling.
- [ ] Add GitHub Actions CI to run linting and tests for PRs.
- [ ] Add
pre-commitconfig to avoid accidental commits of binary or cache files. - [ ] Add examples showing how to run training scripts for LoRA/SFT/DPO and how to apply saved checkpoints to the ComfyUI nodes.
If you'd like, I can implement the LongCatCheckpointLoader improvements and the TextEncodeLongCatImageEdit node next, and add GitHub Actions for CI.
Credits
- LongCat base model & pipelines: original LongCat project (LongCat team).
- diffusers (Hugging Face) for pipeline scaffolding.
- transformers (Hugging Face) for text and vision encoders.
- accelerate for optional CPU/GPU offload helpers.
- PyTorch for core tensor and model runtime.
- Pillow (PIL) and NumPy for image/tensor conversions.