ComfyUI Extension: ComfyUI-TF32-Enabler
Automatically enables TensorFloat-32 (TF32) acceleration for NVIDIA RTX 30/40/50 series GPUs in ComfyUI.
Custom Nodes (0)
README
ComfyUI TF32 Enabler
Automatically enables TensorFloat-32 (TF32) acceleration for NVIDIA RTX 30/40/50 series GPUs in ComfyUI.
Note: to use torch compile you have to disable cudamallochasync
๐ Performance Benefits
- 1.5-2x speedup for diffusion models on Ampere/Ada/Blackwell GPUs
- Minimal precision impact (maintains quality)
- Automatic activation on ComfyUI startup
- Zero configuration required
๐ Requirements
- NVIDIA GPU with compute capability >= 8.0:
- RTX 30 series (Ampere)
- RTX 40 series (Ada Lovelace)
- RTX 50 series (Blackwell)
- A100, A6000, etc.
- PyTorch with CUDA support
- ComfyUI
๐ฆ Installation
cd ComfyUI/custom_nodes
git clone https://github.com/marduk191/ComfyUI-TF32-Enabler.git
# Or download and extract the zip file
โ Verification
When ComfyUI starts, you should see:
============================================================
๐ ComfyUI TF32 Acceleration Enabled
============================================================
Matmul TF32: True
cuDNN TF32: True
CUDA Allocator: expandable_segments:True
GPU: NVIDIA GeForce RTX 5090
Compute Capability: 10.0
โ
torch.compile CUDA allocator fix applied
============================================================
๐งช Testing
Run the included test script to verify torch.compile works:
cd ComfyUI/custom_nodes/ComfyUI-TF32-Enabler
python test_torch_compile.py
๐ง Technical Details
This custom node enables:
torch.backends.cuda.matmul.allow_tf32 = Truetorch.backends.cudnn.allow_tf32 = TruePYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
TF32 uses 10-bit mantissa (vs FP32's 23-bit) while maintaining the same 8-bit exponent range, providing:
- Faster computation on tensor cores
- Same dynamic range as FP32
- Negligible quality loss for AI inference
The expandable segments allocator configuration resolves memory allocation issues when using torch.compile with CUDA operations.
๐ Benchmarks
Typical speedups on RTX 5090:
- SDXL: ~1.8x faster
- Flux: ~1.9x faster
- SD3: ~1.7x faster
๐ ๏ธ Compatibility
Works with all ComfyUI workflows and custom nodes. No conflicts expected.
๐ License
MIT License - See LICENSE file for details
๐ค Contributing
Issues and pull requests welcome!
๐ Links
Note: If your GPU doesn't support TF32 (older than RTX 30 series), this node will safely do nothing and won't cause errors.