ComfyUI Extension: ComfyUI-DSD

Authored by irreveloper

Created 5 months ago

Updated 5 months ago

41 stars

An Unofficial ComfyUI custom node package that integrates a/Diffusion Self-Distillation (DSD) for zero-shot customized image generation. DSD is a model for subject-preserving image generation that allows you to create images of a specific subject in novel contexts without per-instance tuning.

Custom Nodes (3)

README

ComfyUI-DSD

An Unofficial ComfyUI custom node package that integrates Diffusion Self-Distillation (DSD) for zero-shot customized image generation.

DSD is a model for subject-preserving image generation that allows you to create images of a specific subject in novel contexts without per-instance tuning.

Features

Subject-preserving image generation using DSD model
Gemini API prompt enhancement
Direct model download from Hugging Face
Fine-grained control over generation parameters
Multiple image resizing options

Installation

Clone this repository into your ComfyUI custom_nodes folder:

cd ComfyUI/custom_nodes
git clone https://github.com/irreveloper/ComfyUI-DSD.git

Install the required dependencies:

pip install -r requirements.txt

Get the model files (two options):
- Option 1: Use the DSD Model Downloader node in ComfyUI to automatically download the model
- Option 2: Download manually from Hugging Face or Google Drive
The model files will be stored in:
- ComfyUI/models/dsd_model/transformer/ (for transformer files)
- ComfyUI/models/dsd_model/pytorch_lora_weights.safetensors (for LoRA file)
Restart ComfyUI

Available Nodes

DSD Model Downloader: Automatically downloads the model from Hugging Face
- Supports downloading from custom repositories with the repo_id parameter
- Includes options for model precision (bfloat16, float16, float32)
- Provides memory optimization options (low_cpu_mem_usage, model_cpu_offload, sequential_cpu_offload)
- Optional Hugging Face token support via parameter or HF_TOKEN environment variable
DSD Model Loader: Loads a pre-downloaded model
- Supports custom model and LoRA paths
- Multiple precision options (bfloat16, float16, float32)
- Memory optimization options for different hardware configurations
DSD Model Selector: Helps select models from local directories
- Automatically finds models in the default ComfyUI model paths
- Verifies model existence and provides appropriate warnings
DSD Gemini Prompt Enhancer: Uses Google's Gemini API to enhance prompts for better image generation results
- The API key can be provided in two ways:
  - As an input parameter to the node (not recommended for sharing workflows)
  - Through the GEMINI_API_KEY environment variable (strongly recommended)
- Analyzes both the input image and text prompt to generate improved prompts
Note: To use the enhanced prompts, connect this node's output to the DSD Image Generator's prompt input and enable the use_gemini_prompt option. If no API key is provided, the original prompt will be used.
DSD Image Generator: Generates images with the DSD model
- Supports detailed parameter control:
  - Guidance scale (overall, image-specific, and text-specific)
  - Inference steps
  - Resolution control
  - Seed control (0 for random seed)
- Returns both the generated image and the reference image
- Displays progress during generation
DSD Resize Selector: Provides flexible image resizing options for the DSD Image Generator:
- resize_and_center_crop: Resizes and center crops the image (default behavior)
- center_crop: Simple center crop and resize
- pad: Preserves aspect ratio and adds padding to reach target size
- fit: Resizes to target dimensions without preserving aspect ratio
- Additional customization:
  - Interpolation method (LANCZOS, BICUBIC, BILINEAR, NEAREST)
  - Padding color (RGB values for pad mode)

Basic Workflow

Sample

Advanced Usage

Memory Optimization

The DSD model can be memory-intensive. Several options are available to optimize memory usage:

Precision: Use bfloat16 (default) for the best balance of speed and memory usage
CPU Offloading: Enable model_cpu_offload or sequential_cpu_offload for systems with limited VRAM
Resolution: Lower resolution and fewer inference steps can significantly reduce memory requirements

Gemini API Integration

For optimal results with the Gemini API:

Obtain a Gemini API key from Google AI Studio
Set it as an environment variable: GEMINI_API_KEY=your_key_here
Connect the DSD Gemini Prompt Enhancer to your workflow
Enable use_gemini_prompt on the DSD Image Generator

Custom Model Loading

If you have custom DSD models or want to use a different repository:

Use the DSD Model Downloader with a custom repo_id
Or manually download the model files and use DSD Model Loader with custom paths

Troubleshooting

Memory Issues: Try reducing precision (use bfloat16), lower resolution, or fewer steps
Gemini API: Ensure you have a valid API key (can be set via GEMINI_API_KEY environment variable)
Model Loading: If you see errors, try using the Model Downloader node to re-download files
Import Errors: Make sure all dependencies are installed correctly
CUDA Errors: If you encounter CUDA out-of-memory errors, try enabling CPU offloading options

Examples

Check the examples directory for sample workflows.