ComfyUI Extension: ComfyUI-IPAdapterWAN

Authored by SirLatore

Created 6 months ago

Updated 6 months ago

8 stars

This extension adapts the a/InstantX IP-Adapter for SD3.5-Large to work with Wan 2.1 and other UNet-based video/image models in ComfyUI. Unlike the original SD3 version (which depends on joint_blocks from MMDiT), this version performs sampling-time identity conditioning by dynamically injecting into attention layers — making it compatible with models like Wan 2.1, AnimateDiff, and other non-SD3 pipelines.

Custom Nodes (0)

README

ComfyUI-IPAdapter-WAN

This extension adapts the InstantX IP-Adapter for SD3.5-Large to work with Wan 2.1 and other UNet-based video/image models in ComfyUI.

Unlike the original SD3 version (which depends on joint_blocks from MMDiT), this version performs sampling-time identity conditioning by dynamically injecting into attention layers — making it compatible with models like Wan 2.1, AnimateDiff, and other non-SD3 pipelines.

🚀 Features

🔁 Injects identity embeddings during sampling via attention block patching
🧠 Works with Wan 2.1 and other UNet-style models (no SD3/MMDiT required)
🛠️ Built on top of ComfyUI's IPAdapter framework
🎨 Enables consistent face/identity across frames in video workflows

📦 Installation

Clone the repo into your ComfyUI custom nodes directory:

git clone https://github.com/your-username/ComfyUI-IPAdapter-WAN.git

Download the required model weights:

Download the IP-Adapter weights:
- ip-adapter.bin
- Place it in: ComfyUI/models/ipadapter/
Download the CLIP Vision model:
- siglip_vision_patch14_384.safetensors
- Place it in: ComfyUI/models/clip_vision/

(Note: This model is based on google/siglip-so400m-patch14-384. The rehosted version yields nearly identical results.)

🧠 How It Works

Wan models use a UNet structure instead of the DiT transformer blocks used in SD3. To make IPAdapter work with Wan:

The extension scans all attention blocks (modules with .to_q and .to_k) dynamically.
It injects IPAdapter's attention processors (IPAttnProcessor) directly into those blocks.
Identity embeddings are updated based on the current sampling timestep using a learned resampler.

This means it works without requiring joint_blocks or specific architectural assumptions — making it plug-and-play for many custom models.

🛠 Usage

In ComfyUI, use the following nodes:
- Load IPAdapter WAN Model
- Apply IPAdapter WAN Model
Connect the CLIP Vision embedding (from a face image) and your diffusion model to the adapter.
Use a weight of ~0.5 as a good starting point.
You can apply this in video workflows to maintain consistent identity across frames.

📁 Example Workflows

Example .json workflows will be available soon in the workflows/ folder.

✅ Compatibility

| Model | Status | | ---------------- | ------------------- | | Wan 2.1 | ✅ Works | | AnimateDiff | ✅ Works | | SD3 / SDXL | ❌ Use original repo | | Any UNet variant | ✅ Likely to work |

🔧 TODOs

Allow multiple adapters without conflict
Auto-detect model parameters (hidden size, num layers)
Convert .bin to safetensors format
Add more workflows for different models

🧑‍💻 Credits

Adapted from: InstantX IPAdapter for SD3.5
ComfyUI extensions by: your name / handle here

Feel free to contribute or suggest improvements via GitHub Issues or Pull Requests.