ComfyUI Extension: RocM Ninodes
ROCM Optimized Nodes for ComfyUI - High-performance VAE decode and sampling nodes specifically tuned for AMD GPUs with ROCm support, particularly targeting gfx1151 architecture
Custom Nodes (0)
README
RocM Ninodes: ROCM Optimized Nodes for ComfyUI
RocM Ninodes is a comprehensive custom node collection that provides optimized operations specifically tuned for AMD GPUs with ROCm support, particularly targeting the gfx1151 architecture. This collection includes optimized VAE decode operations, KSampler implementations, and LoRA loading designed to maximize performance on AMD hardware with mature ROCm drivers.
π What We Do
RocM Ninodes transforms your AMD GPU experience in ComfyUI by providing:
- π― ROCm-Optimized Nodes: Custom implementations of VAE decode, KSampler, and LoRA loading specifically tuned for AMD GPUs
- β‘ Performance Boost: 15-78% faster generation times with better memory efficiency
- π‘οΈ Memory Management: Gentle memory cleanup optimized for mature ROCm drivers
- π§ Easy Integration: Drop-in replacements for standard ComfyUI nodes
- π Real-Time Monitoring: Built-in performance tracking and optimization recommendations
π§ Quantized Model Support
RocM Ninodes now includes comprehensive support for quantized models with automatic detection and optimization:
Supported Quantized Formats
- FP8 Models: Hardware-accelerated FP8 quantization (flux1-dev-fp8.safetensors)
- BFloat16: Native ROCm support with minimal overhead
- INT8/INT4: GGUF format support for WAN 2.2 models
- Automatic Detection: Detects quantized models from filename and dtype
Quantization-Specific Features
- π‘οΈ Compatibility Mode: Automatically disables aggressive optimizations for quantized models
- πΎ Smart Memory Management: Quantization-aware memory allocation (FP8: 50% vs FP32, INT8: 25% vs FP32)
- π Dtype Preservation: Prevents forced dtype conversions that break quantized models
- πΉ Adaptive Video Processing: Smaller chunk sizes and tile sizes for quantized models
- β‘ OOM Prevention: Lower default settings to prevent out-of-memory errors
User-Reported Issue Fixes
- Fixed OOM Errors: Lower default tile_size (512 vs 768) for better compatibility
- Fixed Quantized Model Breaking: Disabled batch optimization by default for quantized models
- Fixed Memory Management: Less aggressive cleanup for quantized models
- Fixed Video Processing: Adaptive chunk sizing based on frame count and available memory
Recommended Settings for Quantized Models
- Compatibility Mode: Enable for quantized models
- Tile Size: Use 512 (conservative) instead of 768
- Video Chunk Size: Use 2-4 frames instead of 8
- Batch Optimization: Disable for quantized models
- Memory Optimization: Disable aggressive cleanup
Our optimization approach focuses on three key areas:
1. ROCm-Specific Optimizations
- Environment Variables: Essential
TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1for mature drivers - Memory Allocation: Optimized settings (256MB chunks, 0.8 threshold) for better fragmentation control
- Precision Handling: Automatic selection of optimal precision for AMD hardware
- Attention Mechanisms: ROCm-tuned attention optimizations for better performance
2. Gentle Memory Management
- Single-Pass Cleanup: Efficient memory clearing without performance penalties
- Smart Monitoring: Real-time memory usage tracking and optimization
- Fragmentation Control: Proactive memory management to prevent OOM errors
- Mature Driver Support: Optimized for current ROCm drivers and libraries
3. Hardware-Specific Tuning
- gfx1151 Architecture: Specifically optimized for AMD Radeon 8060S and similar GPUs
- Unified Memory: Leverages AMD's unified memory architecture for better performance
- Conservative Batching: Smart batching strategies optimized for AMD GPU characteristics
- Tile Size Optimization: Optimal tile sizes (768-1024) for gfx1151 memory bandwidth
π― Real-World Performance Results
Tested on GMTek Evo-X2 Strix Halo (gfx1151) with 128GB Unified RAM:
πΌοΈ Image Generation (Flux)
- 1024x1024 generation: 500s β 110s (78% improvement!)
π¬ Image-to-Video Generation (WAN 2.2 i2v)
- 320x320px, 2s: 163s β 139s (15% improvement!)
- 320x320px, 17 frames: 98.33s β 92.78s (5.6% improvement!)
- 480x480px, 2s: 202s (33 frames, 16fps) β
- 480x720px, 2s: 303s (33 frames, 16fps) β
- Video Quality: Fixed darker frames at chunk boundaries (v1.0.29) β
π Performance Metrics
- Memory efficiency: 50% reduction in attention memory requirements
- Stability: Significantly reduced OOM errors
- Scalability: Successfully handles up to 480x720px i2v generation
- Consistency: Stable performance across multiple runs (5.6% average improvement)
"Workflows that used to take forever to run now complete in a fraction of the time!" - Nino, GMTek Evo-X2 Owner
π¬ Detailed Benchmark Results (WAN 2.2 i2v, 320x320px, 17 frames)
Test Configuration:
- Model: WAN 2.2 i2v 14B
- Resolution: 320x320px
- Frames: 17 frames
- Hardware: GMTek Evo-X2 Strix Halo (gfx1151, 128GB Unified RAM)
With RocM Ninodes Optimizations:
- Run 1: ROCM Advanced KSampler: 20.77s | ROCM VAE Decode: 7.73s | Total: 92.78s
- Run 2: ROCM Advanced KSampler: 21.03s | ROCM VAE Decode: 7.41s | Total: 93.32s
- Average: 93.05s β‘
Without RocM Ninodes (Standard ComfyUI):
- Run 1: Standard KSampler: 22.06s | Standard VAE Decode: 7.48s | Total: 98.33s
- Run 2: Standard KSampler: 22.71s | Standard VAE Decode: 7.20s | Total: 104.01s
- Average: 101.17s π
Performance Improvement: 8.1% faster overall, 5.6% average improvement
π― Try It Now!
- Flux Image Generation - 78% performance improvement!
- WAN 2.2 Video Generation - 15% performance improvement!
π Key Features
- ROCM-Specific Optimizations: Tuned specifically for AMD GPUs with ROCm 6.4+
- gfx1151 Architecture Support: Optimized for Strix Halo and similar architectures
- Performance Monitoring: Built-in performance analysis and optimization recommendations
- Memory Management: Advanced VRAM optimization for AMD GPUs
- Precision Optimization: Automatic precision selection for optimal ROCm performance
Features
ROCMOptimizedVAEDecode
- Optimized for gfx1151: Tuned tile sizes and memory management for your specific GPU
- ROCm-specific optimizations: Disables TF32, enables fp16 accumulation, optimizes for AMD GPUs
- Smart precision handling: Automatically selects optimal precision (fp32 for gfx1151)
- Memory management: Conservative batching strategy for AMD GPUs
- Performance monitoring: Built-in timing and logging
ROCMOptimizedVAEDecodeTiled
- Advanced tiling: More control over tile sizes and overlaps
- Temporal support: Optimized for video VAEs
- ROCm optimizations: Same optimizations as the main decode node
ROCMOptimizedKSampler
- Optimized sampling: ROCm-tuned sampling algorithms for gfx1151
- Memory management: Better VRAM usage during sampling
- Precision optimization: Automatic fp32 selection for ROCm 6.4
- Attention optimization: Optimized attention mechanisms for AMD GPUs
- Performance monitoring: Built-in timing and logging
ROCMOptimizedKSamplerAdvanced
- Advanced control: More sampling parameters and options
- Step control: Start/end step management
- Noise control: Advanced noise handling options
- ROCm optimizations: Same optimizations as the main sampler
ROCMVAEPerformanceMonitor
- Device analysis: Shows your GPU information and current settings
- Performance tips: Provides specific recommendations for your hardware
- Optimal settings: Suggests best parameters for your setup
ROCMSamplerPerformanceMonitor
- Sampler analysis: Analyzes sampling performance and provides recommendations
- Optimal settings: Suggests best samplers and settings for your GPU
- Performance tips: Specific recommendations for sampling optimization
WindowsPaginationDiagnostic
- Error 1455 detection: Automatically detects Windows pagination errors
- Memory analysis: Checks system memory availability and usage
- Automatic fixes: Applies recommended environment variables and settings
- Step-by-step guidance: Provides detailed instructions for manual fixes
- Real-time monitoring: Shows current memory status and recommendations
π§ͺ Testing
Comprehensive Test Suite
The project includes a comprehensive test suite to ensure reliability and prevent regressions:
Error Prevention Tests
cd /path/to/ComfyUI/custom_nodes/rocm_ninodes
source /path/to/ComfyUI/.venv/bin/activate
python test_vae_error_scenarios.py
Test Coverage:
- β
AttributeError:
'dict' object has no attribute 'shape' - β
IndexError:
tuple index out of range - β
ValueError:
Expected numpy array with ndim 3 but got 4 - β VAE Decode Input Formats: 5D vs 4D tensor handling
- β Chunked Video Processing: Memory-safe chunking logic
- β Tensor Shape Conversions: 5Dβ4D conversion validation
- β Memory Calculation Edge Cases: Various tensor sizes
- β Error Recovery Scenarios: Malformed input handling
- β Performance Benchmarks: Decode timing tests
Test Results
Ran 9 tests in 0.032s
OK
Debug Data Collection
The nodes automatically collect debug data for optimization analysis:
- Location:
test_data/debug/wan_vae_input_debug_{timestamp}.pkl - Content: Tensor shapes, types, device info, and actual tensor data
- Usage: Run optimization tests and analyze performance
Performance Testing
# Run optimization tests
python test_vae_optimization.py
# Run error scenario tests
python test_vae_error_scenarios.py
# Debug VAE decode issues
python debug_vae_decode.py
Test Data Structure
test_data/
βββ debug/ # Raw debug data from workflows
βββ optimization/ # Optimization test results
βββ benchmarks/ # Performance benchmarks
βββ README.md # Test data documentation
π ComfyUI Installation with uv
Complete Setup Guide
Tested on Manjaro Linux with GMTek Evo-X2 Strix Halo (gfx1151, 128GB Unified RAM)
π§ Linux (Manjaro/Ubuntu/Arch/etc.)
- Install uv (if not already installed):
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
source ~/.bashrc # or ~/.zshrc
- Clone and setup ComfyUI:
# Clone ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
# Create virtual environment with uv
uv venv
source .venv/bin/activate
# Install dependencies
uv pip install -r requirements.txt
# Install ROCm PyTorch nightly for gfx1151
uv pip uninstall torch torchaudio torchvision
uv pip install --index-url https://rocm.nightlies.amd.com/v2/gfx1151/ --pre torch torchaudio torchvision --upgrade
- Start ComfyUI with optimized flags:
export TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1
uv run main.py --use-pytorch-cross-attention --highvram --cache-none
πͺ Windows (PowerShell)
- Install uv (if not already installed):
# Install uv via pip
pip install uv
# Or download from: https://github.com/astral-sh/uv/releases
- Clone and setup ComfyUI:
# Clone ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
# Create virtual environment with uv
uv venv
.venv\Scripts\Activate.ps1
# Install dependencies
uv pip install -r requirements.txt
# Install ROCm PyTorch nightly for gfx1151
uv pip uninstall torch torchaudio torchvision
uv pip install --index-url https://rocm.nightlies.amd.com/v2/gfx1151/ --pre torch torchaudio torchvision --upgrade
- Start ComfyUI with optimized flags:
# Set environment variable
$env:TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL="1"
# Start ComfyUI
uv run main.py --use-pytorch-cross-attention --highvram --cache-none
Note for Windows users: ROCm support on Windows is limited. For best performance, consider using WSL2 with Ubuntu or dual-booting Linux.
π¦ Plugin Installation
Method 1: ComfyUI CLI (Recommended)
Install ComfyUI CLI first:
pip install comfy-cli
Then install the plugin:
comfy node install rocm-ninodes
Method 2: Manual Installation
Prerequisites
For gfx1151 (Strix Halo) users, follow these setup steps:
π§ Linux (Manjaro/Ubuntu/etc.)
- Install ROCm PyTorch nightly build:
# Uninstall regular/CUDA PyTorch first
uv pip uninstall torch torchaudio torchvision
# Install ROCm nightly for gfx1151
uv pip install --index-url https://rocm.nightlies.amd.com/v2/gfx1151/ --pre torch torchaudio torchvision --upgrade
- Start ComfyUI with optimized flags:
export TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1
uv run main.py --use-pytorch-cross-attention --highvram --cache-none
πͺ Windows (PowerShell)
- Install ROCm PyTorch nightly build:
# Uninstall regular/CUDA PyTorch first
pip uninstall torch torchaudio torchvision
# Install ROCm nightly for gfx1151
pip install --index-url https://rocm.nightlies.amd.com/v2/gfx1151/ --pre torch torchaudio torchvision --upgrade
- Start ComfyUI with optimized flags:
# Set environment variable
$env:TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL="1"
# Start ComfyUI
python main.py --use-pytorch-cross-attention --highvram --cache-none
Note for Windows users: ROCm support on Windows is limited. For best performance, consider using WSL2 with Ubuntu or dual-booting Linux.
Method 3: Git Clone
π§ Linux/Mac:
cd ComfyUI/custom_nodes
git clone https://github.com/iGavroche/rocm-ninodes.git ComfyUI-ROCM-Optimized-VAE
cd ComfyUI-ROCM-Optimized-VAE
python install.py
πͺ Windows (PowerShell):
cd ComfyUI\custom_nodes
git clone https://github.com/iGavroche/rocm-ninodes.git ComfyUI-ROCM-Optimized-VAE
cd ComfyUI-ROCM-Optimized-VAE
python install.py
Method 4: Download ZIP
- Download the latest release from GitHub
- Extract to
ComfyUI/custom_nodes/ComfyUI-ROCM-Optimized-VAE/ - Run
python install.pyto verify installation
Windows users: Right-click the ZIP file β "Extract All" β Choose the ComfyUI/custom_nodes/ folder
Method 5: ComfyUI Manager (Future)
Coming soon - will be available through ComfyUI Manager
Post-Installation
- Restart ComfyUI to load the new nodes
- Verify Installation: Check that nodes appear in "RocM Ninodes" folder in the node panel:
- RocM Ninodes/VAE: VAE Decode, VAE Decode Tiled, VAE Performance Monitor
- RocM Ninodes/Sampling: KSampler, KSampler Advanced, Sampler Performance Monitor
- Test Performance: Use the Performance Monitor nodes to verify optimizations
π Plugin Updates
How to Update RocM Ninodes
π§ Linux (Manjaro/Ubuntu/etc.)
Method 1: Git Pull (Recommended)
cd ComfyUI/custom_nodes/ComfyUI-ROCM-Optimized-VAE
git pull origin main
Method 2: Fresh Install
# Remove old version
rm -rf ComfyUI/custom_nodes/ComfyUI-ROCM-Optimized-VAE
# Install latest version
cd ComfyUI/custom_nodes
git clone https://github.com/iGavroche/rocm-ninodes.git ComfyUI-ROCM-Optimized-VAE
πͺ Windows (PowerShell)
Method 1: Git Pull (For Existing Installations)
# Navigate to the plugin directory
cd ComfyUI\custom_nodes\ComfyUI-ROCM-Optimized-VAE
# Pull latest changes
git pull origin main
Method 2: Fresh Install (For New Installations)
# Navigate to custom_nodes directory
cd ComfyUI\custom_nodes
# Clone the repository
git clone https://github.com/iGavroche/rocm-ninodes.git ComfyUI-ROCM-Optimized-VAE
# Navigate into the plugin directory
cd ComfyUI-ROCM-Optimized-VAE
Method 3: Update Existing Installation (If git pull fails)
# Navigate to custom_nodes directory
cd ComfyUI\custom_nodes
# Remove old version
Remove-Item -Recurse -Force ComfyUI-ROCM-Optimized-VAE
# Clone fresh copy
git clone https://github.com/iGavroche/rocm-ninodes.git ComfyUI-ROCM-Optimized-VAE
# Navigate into the plugin directory
cd ComfyUI-ROCM-Optimized-VAE
After Updating
- Restart ComfyUI to load the updated nodes
- Check for new features in the node panel
- Test workflows to ensure compatibility
- Check the CHANGELOG for new features and fixes
Update Notifications
- GitHub Releases: Watch the repository for release notifications
- ComfyUI Manager: Future updates will be available through ComfyUI Manager
- Performance Updates: New optimizations are regularly added based on community feedback
π Quick Start - Optimized Workflow
Ready to test the optimizations? Download the pre-configured workflow:
π₯ Download Optimized Workflows
- Flux Image Generation - Complete Flux workflow with ROCM optimizations
- WAN 2.2 Video Generation - WAN 2.2 Image-to-Video workflow with ROCM optimizations
This workflow includes:
- β ROCM VAE Decode (optimized for gfx1151)
- β ROCM KSampler (with memory optimizations)
- β Performance Monitors (to track improvements)
- β Optimal Settings (tuned for Strix Halo)
How to use:
- Download the workflow JSON file
- Open in ComfyUI (drag & drop or File β Load)
- Install missing nodes via ComfyUI Manager (if prompted)
- Run and enjoy 78% faster generation! π
Usage
Basic Usage
- Replace your standard VAE Decode node with "ROCM VAE Decode"
- Replace your standard KSampler with "ROCM KSampler"
- Use the default settings (optimized for gfx1151)
- Enable "use_rocm_optimizations" for best performance
Advanced Usage
-
VAE Settings:
- Tile Size: 768-1024 works well for gfx1151 (default: 768)
- Overlap: 96-128 provides good quality (default: 96)
- Precision: "auto" selects optimal for your GPU
- Batch Optimization: Keep enabled for better memory usage
-
Sampler Settings:
- Precision: "auto" selects fp32 for gfx1151
- Memory Optimization: Keep enabled for better VRAM usage
- Attention Optimization: Keep enabled for faster sampling
- Samplers: Euler, Heun, dpmpp_2m work well with ROCm
- CFG: 7.0-8.0 is optimal for gfx1151
Performance Tips for gfx1151
- Use fp32 precision (automatically selected)
- Tile size 768-1024 for 1024x1024 images
- Enable all ROCm optimizations
- Use tiled decode for images larger than 1024x1024
Expected Performance Improvements
Based on gfx1151 architecture optimizations:
- VAE Decode: 15-25% faster, 20-30% better VRAM usage
- Sampling: 10-20% faster sampling with better memory management
- Overall Workflow: 20-40% faster end-to-end generation
- Memory efficiency: 25-35% better VRAM usage overall
- Stability: Reduced OOM errors with better memory management
- Quality: Maintained or improved output quality
Troubleshooting
π¨ Quick Fix for Common Windows Errors
If you see these errors:
fatal: couldn't find remote ref ComfyUI-ROCM-Optimized-VAEdoes not appear to be a git repositoryLe module Β« .venv Β» n'a pas pu Γͺtre chargΓ©
Quick Solution:
# 1. Navigate to ComfyUI directory
cd C:\ComfyUI
# 2. Activate virtual environment
.venv\Scripts\Activate.ps1
# 3. Navigate to custom_nodes
cd custom_nodes
# 4. Clone the plugin (if not already installed)
git clone https://github.com/iGavroche/rocm-ninodes.git ComfyUI-ROCM-Optimized-VAE
# 5. Navigate into the plugin directory
cd ComfyUI-ROCM-Optimized-VAE
# 6. Run the installer
python install.py
πͺ Windows Pagination Error Fixes (Error 1455)
If you encounter the error "Le fichier de pagination est insuffisant pour terminer cette opΓ©ration" (os error 1455):
π¨ Quick Fix (Recommended)
Use the new Windows Pagination Diagnostic node in ComfyUI:
- Add "Windows Pagination Diagnostic" node from "RocM Ninodes/Diagnostics"
- Connect it to your workflow
- Run it to automatically diagnose and fix the issue
Method 1: Environment Variable (Immediate Fix)
# Set environment variable before starting ComfyUI
$env:PYTORCH_CUDA_ALLOC_CONF = "expandable_segments:True,max_split_size_mb:512"
$env:PYTORCH_HIP_ALLOC_CONF = "expandable_segments:True"
python main.py
Method 2: Batch File Solution
Create a start_comfyui.bat file in your ComfyUI directory:
@echo off
set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:512
set PYTORCH_HIP_ALLOC_CONF=expandable_segments:True
python main.py
pause
Method 3: PowerShell Profile (Permanent)
Add to your PowerShell profile:
# Open PowerShell profile
notepad $PROFILE
# Add these lines:
$env:PYTORCH_CUDA_ALLOC_CONF = "expandable_segments:True,max_split_size_mb:512"
$env:PYTORCH_HIP_ALLOC_CONF = "expandable_segments:True"
Method 4: System Environment Variable (Permanent)
- Press
Win + R, typesysdm.cpl, press Enter - Click "Environment Variables"
- Under "User variables", click "New"
- Variable name:
PYTORCH_CUDA_ALLOC_CONF - Variable value:
expandable_segments:True,max_split_size_mb:512 - Click "New" again
- Variable name:
PYTORCH_HIP_ALLOC_CONF - Variable value:
expandable_segments:True - Click OK and restart ComfyUI
Method 5: Increase Windows Paging File (Most Effective)
- Press
Win + R, typesysdm.cpl, press Enter - Click "Advanced" tab β "Performance Settings" β "Advanced" tab
- Under "Virtual memory", click "Change"
- Uncheck "Automatically manage paging file size for all drives"
- Select your system drive (usually C:)
- Select "Custom size"
- Set Initial size:
16384MB (16 GB) - Set Maximum size:
32768MB (32 GB) - Click "Set", then "OK", then restart ComfyUI
Method 6: PowerShell Script (Advanced)
Create fix_pagination.ps1:
# Fix Windows pagination error 1455
Write-Host "Applying Windows pagination fixes..." -ForegroundColor Green
# Set environment variables
$env:PYTORCH_CUDA_ALLOC_CONF = "expandable_segments:True,max_split_size_mb:512"
$env:PYTORCH_HIP_ALLOC_CONF = "expandable_segments:True"
$env:PYTORCH_CUDA_MEMORY_POOL_TYPE = "expandable_segments"
# Check memory
$memory = Get-WmiObject -Class Win32_PhysicalMemory | Measure-Object -Property Capacity -Sum
$totalGB = [math]::Round($memory.Sum / 1GB, 2)
Write-Host "Total RAM: $totalGB GB" -ForegroundColor Yellow
if ($totalGB -lt 16) {
Write-Host "WARNING: Less than 16GB RAM detected. Consider increasing paging file." -ForegroundColor Red
}
Write-Host "Environment variables set. Starting ComfyUI..." -ForegroundColor Green
python main.py
Run with: powershell -ExecutionPolicy Bypass -File fix_pagination.ps1
π§ Advanced Windows Troubleshooting
Memory Issues on Windows:
# Check available memory
Get-WmiObject -Class Win32_PhysicalMemory | Measure-Object -Property Capacity -Sum
# Set additional memory management
$env:PYTORCH_CUDA_ALLOC_CONF = "expandable_segments:True,max_split_size_mb:512"
ROCm Installation Issues:
# Verify ROCm installation
rocm-smi
# Check PyTorch ROCm support
python -c "import torch; print(torch.cuda.is_available()); print(torch.version.hip)"
Git Issues on Windows:
# Fix line ending issues
git config --global core.autocrlf true
# Reset repository if corrupted
cd custom_nodes
rmdir /s ComfyUI-ROCM-Optimized-VAE
git clone https://github.com/iGavroche/rocm-ninodes.git ComfyUI-ROCM-Optimized-VAE
If you experience issues:
- Check the Performance Monitor node for recommendations
- Try reducing tile size if you get OOM errors
- Ensure you're using ROCm-compatible PyTorch
- Check that "use_rocm_optimizations" is enabled
ROCm Requirements:
- PyTorch with ROCm support (nightly build recommended)
- ROCm 6.4+ (you're using 6.4)
uv-Specific Issues
π§ Linux (Manjaro/Ubuntu/etc.)
-
uv not found after installation:
# Add to your shell profile (~/.bashrc or ~/.zshrc) echo 'export PATH="$HOME/.cargo/bin:$PATH"' >> ~/.bashrc source ~/.bashrc -
Virtual environment not activating:
# Make sure you're in the ComfyUI directory cd ComfyUI source .venv/bin/activate -
PyTorch ROCm installation fails:
# Clear uv cache and retry uv cache clean uv pip install --index-url https://rocm.nightlies.amd.com/v2/gfx1151/ --pre torch torchaudio torchvision --upgrade -
Permission issues with uv:
# Install uv for current user only curl -LsSf https://astral.sh/uv/install.sh | sh
πͺ Windows (PowerShell)
-
uv not found after installation:
# Add uv to PATH or use full path $env:PATH += ";C:\Users\$env:USERNAME\.cargo\bin" # Or restart PowerShell after installation -
Virtual environment not activating:
# Make sure you're in the ComfyUI directory cd ComfyUI .venv\Scripts\Activate.ps1 -
PowerShell execution policy:
# If you get execution policy errors Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser -
Git not found:
# Install Git for Windows from: https://git-scm.com/download/win # Or use GitHub Desktop -
"fatal: couldn't find remote ref" error:
# This happens when trying to git pull before cloning # Solution: Clone the repository first cd ComfyUI\custom_nodes git clone https://github.com/iGavroche/rocm-ninodes.git ComfyUI-ROCM-Optimized-VAE -
"does not appear to be a git repository" error:
# This happens when the directory isn't a git repository # Solution: Clone the repository first cd ComfyUI\custom_nodes git clone https://github.com/iGavroche/rocm-ninodes.git ComfyUI-ROCM-Optimized-VAE -
Virtual environment activation fails:
# Make sure you're in the ComfyUI directory (not custom_nodes) cd C:\ComfyUI # Try activating the virtual environment .venv\Scripts\Activate.ps1 # If that fails, try this alternative & ".venv\Scripts\Activate.ps1"
Windows-Specific Issues
-
ROCm not working on Windows:
- ROCm has limited Windows support
- Consider using WSL2 with Ubuntu for better compatibility
- Or dual-boot Linux for optimal performance
-
PowerShell execution policy:
# If you get execution policy errors Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser -
Python path issues:
# Make sure Python is in your PATH python --version # If not found, add Python to PATH or use full path -
Git not found:
- Install Git for Windows from https://git-scm.com/download/win
- Or use GitHub Desktop for GUI-based cloning
Technical Details
Optimizations Applied:
- Memory Management: Conservative batching for AMD GPUs
- Precision: fp32 preferred over bf16 for gfx1151
- Tile Sizing: Optimized for gfx1151 memory bandwidth
- ROCm Settings: Disabled TF32, enabled fp16 accumulation
- Batch Processing: Improved batch size calculation
Architecture-Specific Tuning:
- gfx1151: Optimized tile sizes (768-1024)
- Memory: Conservative memory allocation
- Precision: fp32 for best ROCm performance
- Batching: AMD-optimized batch sizes
Testing
ROCM Ninodes includes a comprehensive test suite to ensure performance and correctness.
Quick Start
cd tests
./run_tests.sh
Test Categories
- Performance Tests: Validate timing targets (78% Flux improvement, 5.6% WAN improvement)
- Correctness Tests: Verify tensor shapes and data formats
- Integration Tests: Full ComfyUI workflow testing
- Mock Data Tests: Tests using synthetic data when real data unavailable
Data Capture
Enable debug mode to capture real workflow data for testing:
export ROCM_NINODES_DEBUG=1
# Run your ComfyUI workflows
# Data will be saved to test_data/captured/
Performance Targets
- Flux Checkpoint Load: <30s
- Flux VAE Decode: <10s
- WAN Sampling: <100s
- WAN VAE Decode: <10s
Documentation
- Testing Guide: Comprehensive testing documentation
- Architecture: System architecture and constraints
- Rules: Development rules and best practices
Contributing
Feel free to submit issues or pull requests to improve the optimizations for your specific use case.