ComfyUI Extension: FaceCLIP-ComfyUI

Authored by techzuhaib

Created

Updated

0 stars

FaceCLIP nodes for ComfyUI providing joint embeddings from aligned face images plus text prompts, and identity-preserving image synthesis using fine-tuned SDXL UNet. (Description by CC)

Custom Nodes (0)

    README

    FaceCLIP ComfyUI Custom Nodes (Slim Package)

    This repository provides slim custom nodes to use FaceCLIP inside ComfyUI:

    • FaceCLIP Encode (Image+Text): Produces FaceCLIP joint embeddings from an aligned face image + text prompt.
    • FaceCLIP SDXL Generate: Uses the FaceCLIP encoder + fine-tuned SDXL UNet to synthesize identity-preserving images from a face image and prompt.

    Underlying research: bytedance/FaceCLIP (Apache 2.0). This slim package vendors only configs and node logic; it downloads required weights on demand.

    Contents

    comfyui_faceclip/
      __init__.py
      faceclip_node.py
      generation_node.py
    configs/
      face_clip_l_14_config.yaml
      face_clip_g_14_config.yaml
    asset/
      0001_female.png        # example face (optional)
    tests/
      test_encode.py
      test_generate.py
    requirements.txt
    LICENSE
    README.md
    

    Installation (ComfyUI Server)

    From your ComfyUI root:

    cd custom_nodes
    # Clone your fork or this slim repo
    git clone https://github.com/<your_user>/FaceCLIP-ComfyUI.git FaceCLIP-ComfyUI
    cd FaceCLIP-ComfyUI
    pip install -r requirements.txt
    

    First run downloads large checkpoints (FaceCLIP encoder ~3GB, open_clip weights ~1.7GB + ~10GB, UNet weights). Ensure sufficient disk & network.

    Required Base Code

    This slim node package expects the original FaceCLIP code (the core/ module providing core.face_clip.face_clip) to be importable. You have two options:

    1. Clone original repo alongside slim nodes:
    cd custom_nodes
    git clone https://github.com/bytedance/FaceCLIP.git FaceCLIP
    git clone https://github.com/<your_user>/FaceCLIP-ComfyUI.git FaceCLIP_ComfyUI  # rename to avoid dash
    
    1. Vendor required core/ subtree into this slim package (copy the core/face_clip directory and any dependencies).

    If you only clone the slim package, imports like from core.face_clip.face_clip import FaceCLIP_L_G_Wrapper will fail.

    Nodes

    1. FaceCLIP Encode (Image+Text)

    Inputs:

    • image (IMAGE): (B,H,W,3) aligned/cropped face batch.
    • text (STRING): Prompt describing subject & scene.
    • device: Must be cuda (CPU unsupported currently).

    Outputs:

    • faceclip_embeddings: Token embeddings combining L + bigG.
    • faceclip_pooled: Pooled embedding.

    2. FaceCLIP SDXL Generate

    Inputs:

    • face_image: (B,H,W,3) aligned face.
    • prompt: Text prompt.
    • negative_prompt: Extra negatives appended to built-in quality suppression list.
    • num_images: Images per prompt.
    • width, height: Output resolution (multiples of 8).
    • seed: Random seed.
    • steps: Diffusion steps (default 30).
    • guidance: CFG scale (default 7.0).
    • device: cuda only.

    Output:

    • images: Generated batch (N,H,W,3) float in [0,1].

    Testing

    Run inside the slim repo root (requires CUDA GPU):

    python tests/test_encode.py
    python tests/test_generate.py
    

    If CUDA missing, tests skip gracefully.

    Usage Flow in ComfyUI

    1. Load a face image (use crop/alignment externally).
    2. Add FaceCLIP Encode (Image+Text) OR directly FaceCLIP SDXL Generate.
    3. For generation, feed your face image and prompt, adjust steps/guidance.

    Performance & Requirements

    • GPU with bfloat16 support (Ampere+ recommended) for SDXL node.
    • VRAM: ≥16GB recommended for comfortable SDXL generation with FaceCLIP bigG branch.
    • Disk: ≥20GB for checkpoints & open_clip weights.

    Limitations

    • CUDA only (mixed precision & bfloat16 paths assumed).
    • Multi-face batching not tuned; best with single face per prompt.
    • No CPU fallback; implementing would require dtype & performance adjustments.
    • Requires original FaceCLIP repo or vendored core/ code present for Python imports.

    Attribution

    Original research & base code: https://github.com/bytedance/FaceCLIP License: Apache 2.0 (see LICENSE). The model weights are governed by their original licenses; ensure compliance.

    Roadmap (Optional Enhancements)

    • Add CPU or FP32 fallback.
    • Integrate FLUX / FaceT5 variant.
    • Provide face alignment preprocessing node.
    • Add image saving option directly in generation node.

    PRs welcome for these improvements.