ComfyUI reference implementation for a/T-GATE.
ComfyUI reference implementation for T-GATE.
T-GATE could brings 10%-50% speed up for different diffusion models, only slightly reduces the quality of the generated images and maintains the original composition.
Some monkey patch is used for current implementation. If any error occurs, make sure you have the latest version.
If my work helps you, consider giving it a star.
Some of my other projects that may help you.
TGate Apply(Deprecated)
node.use_cpu_cache
, reduce some GPU OOM promblem.TL,DR
: Improved performance and T-GATE only works where it needs to work.
TGateApply
to affect other places where the model is used, even if TGateApply
is turned off.The examples directory has workflow example. There are images generated with and without T-GATE in the assets folder.
| Origin result | T-GATE result | | :---: | :---: | | | |
T-GATE result image comes from the workflow included in the example image.
AutomaticCFG is another ComfyUI plugin: Your CFG won't be your CFG anymore. It is turned into a way to guide the CFG/final intensity/brightness/saturation, and it adds a 30% speed increase.
env: T4-8G
| | Origin | T-GATE 0.5 | AutomaticCFG | T-GATE 0.35 |AutomaticCFG fatest | | :---: | :---: | :---: | :---: | :---: | :---: | | result | | | | | | | speed | 4.59it/s | 5.68it/s | 5.62it/s| 6.13it/s | 6.13it/s |
T-GATE performs best when maintaining the original composition. However, if you don't need to maintain composition, AutomaticCFG fatest also brings about the same performance improvement.
git clone https://github.com/JettHu/ComfyUI_TGate
# that's all!
Load Checkpoint
and other MODEL loaders.Load Checkpoint
and other MODEL loaders.only_cross_attention
is false
, percentage of steps too. Defines at what percentage point of the generation to start use the T-GATE cache on latent self attnention.This node is already deprecated, and will be removed after few version.
Load Checkpoint
and other MODEL loaders.only_cross_attention
is false
, percentage of steps too. Defines at what percentage point of the generation to start use the T-GATE cache on latent self attnention.| Model | MACs | Param | Latency | Zero-shot 10K-FID on MS-COCO | |-----------------------|----------|-----------|---------|---------------------------| | SD-1.5 | 16.938T | 859.520M | 7.032s | 23.927 | | SD-1.5 w/ TGATE | 9.875T | 815.557M | 4.313s | 20.789 | | SD-2.1 | 38.041T | 865.785M | 16.121s | 22.609 | | SD-2.1 w/ TGATE | 22.208T | 815.433 M | 9.878s | 19.940 | | SD-XL | 149.438T | 2.570B | 53.187s | 24.628 | | SD-XL w/ TGATE | 84.438T | 2.024B | 27.932s | 22.738 | | Pixart-Alpha | 107.031T | 611.350M | 61.502s | 38.669 | | Pixart-Alpha w/ TGATE | 65.318T | 462.585M | 37.867s | 35.825 | | DeepCache (SD-XL) | 57.888T | - | 19.931s | 23.755 | | DeepCache w/ TGATE | 43.868T | - | 14.666s | 23.999 | | LCM (SD-XL) | 11.955T | 2.570B | 3.805s | 25.044 | | LCM w/ TGATE | 11.171T | 2.024B | 3.533s | 25.028 | | LCM (Pixart-Alpha) | 8.563T | 611.350M | 4.733s | 36.086 | | LCM w/ TGATE | 7.623T | 462.585M | 4.543s | 37.048 |
The latency is tested on a 1080ti commercial card.
The MACs and Params are calculated by calflops.
The FID is calculated by PytorchFID.
comfy.samplers.sampling_function
, T-Gate does not perform correctly. refer toFor apple silicon users using the mps backend, torch and macos versions may cause some problems. refer to issue comment.
Fixed in 2024.4.29. Unable to properly remove T-Gate effects. The situation in the picture below is bypass the node after apply
.
| 2024.4.26-29 | Updated on 2024.4.29 | | :---: | :---: | | | |