ComfyUI Extension: ComfyUI-Concept-Diffusion
ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features for ComfyUI
Custom Nodes (0)
README
ComfyUI-Concept-Diffusion
ComfyUI custom node implementation of ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features.
This node allows you to generate high-quality saliency maps that precisely locate textual concepts within images using diffusion transformer attention layers.
Features
- ConceptAttention Node: Extract concept embeddings from diffusion transformer attention layers
- Saliency Map Generation: Generate precise saliency maps for textual concepts
- Zero-shot Segmentation: Perform zero-shot semantic segmentation using concept attention
- Multi-concept Support: Handle multiple concepts simultaneously
- Video Support: Works with video generation models (CogVideoX)
- Visualization Tools: Overlay attention maps on original images
- Flexible Thresholding: Multiple threshold methods for saliency maps
Installation
-
Clone this repository to your ComfyUI custom_nodes folder:
git clone https://github.com/Junst/ComfyUI-Concept-Diffusion.git cd ComfyUI-Concept-Diffusion
-
Install dependencies:
pip install -r requirements.txt
-
Restart ComfyUI
Nodes
ConceptAttentionNode
Main node for generating concept attention maps from diffusion models.
Inputs:
model
: Diffusion model (MODEL)clip
: CLIP text encoder (CLIP)image
: Input image (IMAGE)concepts
: Comma-separated list of concepts (STRING)num_inference_steps
: Number of inference steps (INT)seed
: Random seed (INT, optional)
Outputs:
concept_maps
: Generated concept attention maps (CONCEPT_MAPS)visualized_image
: Visualization of all concept maps (IMAGE)
ConceptSaliencyMapNode
Extract individual concept saliency maps and convert to masks.
Inputs:
concept_maps
: Concept attention maps (CONCEPT_MAPS)concept_name
: Name of concept to extract (STRING)threshold
: Threshold for mask generation (FLOAT)
Outputs:
mask
: Binary mask for the concept (MASK)saliency_image
: Saliency map visualization (IMAGE)
ConceptSegmentationNode
Perform zero-shot semantic segmentation using concept attention.
Inputs:
concept_maps
: Concept attention maps (CONCEPT_MAPS)image
: Original image (IMAGE)concepts
: List of concepts for segmentation (STRING)
Outputs:
segmentation_mask
: Segmentation mask (MASK)segmented_image
: Colored segmentation result (IMAGE)
ConceptAttentionVisualizerNode
Visualize concept attention maps overlaid on the original image.
Inputs:
concept_maps
: Concept attention maps (CONCEPT_MAPS)image
: Original image (IMAGE)overlay_alpha
: Transparency of overlay (FLOAT)
Outputs:
visualized_image
: Image with attention overlay (IMAGE)
Usage Example
- Load an image using
LoadImage
node - Load a diffusion model (Flux, SD3, etc.) using
CheckpointLoaderSimple
- Connect the model, CLIP, and image to
ConceptAttentionNode
- Specify concepts like "person, car, tree, sky, building"
- Use
ConceptSaliencyMapNode
to extract specific concept maps - Use
ConceptSegmentationNode
for zero-shot segmentation - Use
ConceptAttentionVisualizerNode
for visualization - Save results using
SaveImage
nodes
Example Workflow
See example_workflow.json
for a complete ComfyUI workflow example.
Testing
Run the test script to verify the nodes work correctly:
python test_nodes.py
Technical Details
This implementation is based on the ConceptAttention paper which shows that:
- Multi-modal diffusion transformers (DiTs) have rich representations
- Linear projections in the attention output space produce sharper saliency maps
- Concept embeddings can be extracted without additional training
- The method works for both image and video generation models
Supported Models
- Flux (Flux1-dev, Flux1-schnell)
- Stable Diffusion 3/3.5
- CogVideoX (for video)
- Other DiT-based diffusion models
Based on
ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features
License
MIT License