ComfyUI Extension: ComfyUI-IPAdapterWAN
This extension adapts the a/InstantX IP-Adapter for SD3.5-Large to work with Wan 2.1 and other UNet-based video/image models in ComfyUI. Unlike the original SD3 version (which depends on joint_blocks from MMDiT), this version performs sampling-time identity conditioning by dynamically injecting into attention layers — making it compatible with models like Wan 2.1, AnimateDiff, and other non-SD3 pipelines.
Custom Nodes (0)
README
ComfyUI-IPAdapter-WAN
This extension adapts the InstantX IP-Adapter for SD3.5-Large to work with Wan 2.1 and other UNet-based video/image models in ComfyUI.
Unlike the original SD3 version (which depends on joint_blocks
from MMDiT), this version performs sampling-time identity conditioning by dynamically injecting into attention layers — making it compatible with models like Wan 2.1, AnimateDiff, and other non-SD3 pipelines.
🚀 Features
- 🔁 Injects identity embeddings during sampling via attention block patching
- 🧠 Works with Wan 2.1 and other UNet-style models (no SD3/MMDiT required)
- 🛠️ Built on top of ComfyUI's IPAdapter framework
- 🎨 Enables consistent face/identity across frames in video workflows
📦 Installation
- Clone the repo into your ComfyUI custom nodes directory:
git clone https://github.com/your-username/ComfyUI-IPAdapter-WAN.git
- Download the required model weights:
-
Download the IP-Adapter weights:
-
Place it in:
ComfyUI/models/ipadapter/
-
Download the CLIP Vision model:
-
Place it in:
ComfyUI/models/clip_vision/
(Note: This model is based on google/siglip-so400m-patch14-384. The rehosted version yields nearly identical results.)
🧠 How It Works
Wan models use a UNet structure instead of the DiT transformer blocks used in SD3. To make IPAdapter work with Wan:
-
The extension scans all attention blocks (modules with
.to_q
and.to_k
) dynamically. -
It injects IPAdapter's attention processors (
IPAttnProcessor
) directly into those blocks. -
Identity embeddings are updated based on the current sampling timestep using a learned resampler.
This means it works without requiring joint_blocks or specific architectural assumptions — making it plug-and-play for many custom models.
🛠 Usage
-
In ComfyUI, use the following nodes:
-
Load IPAdapter WAN Model
-
Apply IPAdapter WAN Model
-
-
Connect the
CLIP Vision
embedding (from a face image) and your diffusion model to the adapter. -
Use a weight of ~0.5 as a good starting point.
-
You can apply this in video workflows to maintain consistent identity across frames.
📁 Example Workflows
Example .json
workflows will be available soon in the workflows/
folder.
✅ Compatibility
| Model | Status | | ---------------- | ------------------- | | Wan 2.1 | ✅ Works | | AnimateDiff | ✅ Works | | SD3 / SDXL | ❌ Use original repo | | Any UNet variant | ✅ Likely to work |
🔧 TODOs
-
Allow multiple adapters without conflict
-
Auto-detect model parameters (hidden size, num layers)
-
Convert
.bin
tosafetensors
format -
Add more workflows for different models
🧑💻 Credits
-
Adapted from: InstantX IPAdapter for SD3.5
-
ComfyUI extensions by: your name / handle here
Feel free to contribute or suggest improvements via GitHub Issues or Pull Requests.