ComfyUI Extension: ComfyUI · Egregora Audio Super‑Resolution
High‑quality music audio enhancement for ComfyUI: FlashSR super‑resolution + Fat Llama spectral enhancement (GPU & CPU).
Custom Nodes (0)
README
ComfyUI — Egregora Audio Super‑Resolution
✨ High‑quality music audio enhancement for ComfyUI: FlashSR super‑resolution + Fat Llama spectral enhancement (GPU & CPU).
🧩 What is this?
This repository provides three production‑ready ComfyUI nodes focused on music audio enhancement:
- 🎧 Audio Super Resolution (FlashSR) — Fast, one‑step diffusion SR to 48 kHz, chunked with seamless overlap‑add. Auto‑downloads the three required weights on first run.
- 🎛️ Spectral Enhance (Fat Llama — GPU) — Iterative Soft Thresholding (IST) with CuPy/CUDA for clarity/restoration, with safe autoscaling.
- 🎛️ Spectral Enhance (Fat Llama — CPU/FFTW) — CPU fallback using FFTW; slower but portable.
Designed to be simple, robust, and practical for music workflows.
🚀 Quick Install (TL;DR)
# 1) Clone into your ComfyUI custom nodes folder
cd <ComfyUI>/custom_nodes
git clone https://github.com/lucasgattas/ComfyUI-Egregora-Audio-Super-Resolution.git
cd ComfyUI-Egregora-Audio-Super-Resolution
# 2) Install Python dependencies into ComfyUI's Python
python -m pip install -r requirements.txt
# 3) (Optional) Run the helper installer (fetch vendor deps if you use it)
python install.py
# 4) Restart ComfyUI
💡 FlashSR weights are auto‑downloaded on first use via
huggingface_hub
. No manual step required unless the machine is offline.
📦 Requirements
Common
- Python 3.10+ (ComfyUI’s embedded Python is fine)
soundfile
,numpy
,torch
,huggingface_hub
(pulled byrequirements.txt
)
GPU Fat Llama
- NVIDIA GPU (Compute Capability ≥ 7 recommended)
- CuPy for CUDA 12:
pip install cupy-cuda12x
- NVIDIA CUDA runtime + NVRTC DLLs (wheels in
requirements.txt
will installnvidia-cuda-runtime-cu12
,nvidia-cuda-nvrtc-cu12
, etc.)
CPU/FFTW Fat Llama
fat-llama-fftw
(pulled byrequirements.txt
)
Optional
- FFmpeg in PATH for wider audio format support (system install or a ComfyUI FFmpeg node)
📁 Where things go
- Outputs (sidecar) →
ComfyUI/output/audio/
(WAV/FLAC written by nodes) - FlashSR weights →
ComfyUI/models/audio/flashsr/
- Vendor (optional) →
custom_nodes/.../deps/
You can override the models folder with EGREGORA_MODELS_DIR
env var. If you store the FlashSR repo elsewhere, set EGREGORA_FLASHSR_REPO
to that path.
🔧 Nodes & Settings
🎧 Audio Super Resolution (FlashSR)
Purpose: Reconstruct wide‑band detail at 48 kHz using a distilled one‑step SR model + SR vocoder, processed in chunks and stitched by overlap‑add.
Inputs
- chunk_seconds (float, default 5.12) — Window length per pass. FlashSR is trained/evaluated with 5.12 s; keep this unless you have a reason.
- overlap_seconds (float, default 0.50) — Cross‑fade between chunks. Increase (e.g., 0.8–1.0 s) if you hear seam clicks; higher overlap = more compute.
- device (auto/cuda/cpu) — Select GPU when available for speed; CPU is fine for testing.
- target_sr (auto/48000/44100/...) — The model natively outputs 48 kHz. If you need 44.1 kHz, best practice is to keep 48 kHz here and downsample after with a high‑quality resampler.
- output_format (wav/flac) — The node writes a sidecar file to
output/audio/
with this container (WAV=PCM; FLAC=lossless, smaller). - audio_path (string) — Local file input (leave empty if using the upstream AUDIO pipe).
- audio_url (string) — Download and process a remote audio file.
- flashsr_lowpass (bool) — Light anti‑aliasing on input; can help with very low‑SR or hissy sources. Disable if results feel too dull.
Outputs
- AUDIO —
{ "waveform": [1,C,T] tensor, "sample_rate": int }
for chaining in ComfyUI. - Sidecar — WAV/FLAC saved to
output/audio/
using youroutput_format
.
Tips
- Defaults (5.12 s / 0.50 s) are strong. For difficult transitions use 0.8–1.0 s overlap.
- Keep the model at 48 kHz and resample at the very end of the chain for best fidelity.
🎛️ Spectral Enhance (Fat Llama — GPU)
Purpose: Fast Iterative Soft Thresholding (IST) in the spectral domain using CuPy/CUDA to recover clarity and tame residual noise after SR.
Inputs
- target_format (wav/flac) — Output container for the sidecar render.
- max_iterations (int, default 300) — More iterations = more detail but longer runtime. Typical: 200–400; push higher (e.g., 500) cautiously.
- threshold_value (float, default 0.60) — Soft‑threshold strength. Lower (0.50–0.55) = brighter/risk hiss; higher (0.65–0.75) = cleaner/risk dullness.
- target_bitrate_kbps (int, default 1411) — Used by Fat Llama to compute upscale factor.
- toggle_autoscale (bool, default true) — Keep enabled. Prevents clipping and level jumps.
- audio_path / audio_url — Optional file/URL input.
Outputs
- AUDIO and a sidecar WAV/FLAC file.
Notes
-
Normalization is always enabled internally for safety; adaptive filter is disabled (too slow for modest gains).
-
Requires CuPy and CUDA runtime/NVRTC DLLs. Check your setup with:
python -c "import cupy as cp; cp.show_config()"
🎛️ Spectral Enhance (Fat Llama — CPU/FFTW)
Purpose: Same algorithm as GPU, but portable CPU/FFTW implementation.
Inputs
- target_format, max_iterations, threshold_value, target_bitrate_kbps, toggle_autoscale — Same semantics as GPU node.
- audio_path / audio_url — Optional file/URL input.
Outputs
- AUDIO and a sidecar WAV/FLAC file.
When to use
- No discrete NVIDIA GPU, or your environment can’t satisfy CuPy/CUDA requirements.
🎼 Recommended Chains (Music‑centric)
- Quick & Safe → FlashSR → Fat Llama (GPU/CPU) mild → (optional) HQ downsample to 44.1 kHz.
- Problematic transitions → increase FlashSR overlap_seconds toward 0.8–1.0 s.
- Noisy material → enable flashsr_lowpass and/or raise Fat Llama threshold_value slightly.
🧰 Troubleshooting
FlashSR can’t import / no weights
- First run should auto‑download
student_ldm.pth
,sr_vocoder.pth
,vae.pth
intomodels/audio/flashsr/
. - Offline? Place those three files there manually.
- If you cloned the FlashSR repo separately, set
EGREGORA_FLASHSR_REPO
to its path.
CuPy compilation error (e.g., vector_types.h
missing)
-
Install matching CUDA runtime + NVRTC wheels:
python -m pip install "nvidia-cuda-runtime-cu12==12.9.*" "nvidia-cuda-nvrtc-cu12==12.9.*" \ nvidia-cublas-cu12 nvidia-cufft-cu12 nvidia-cusolver-cu12 nvidia-cusparse-cu12 python -m pip install cupy-cuda12x python -c "import cupy as cp; cp.show_config()"
-
On Windows‑portable Python, the node adds DLL search dirs at runtime. Restart ComfyUI after installing wheels.
Clicks at chunk boundaries
- Raise overlap_seconds (0.8–1.0 s). Longer overlaps increase compute but hide seams.
Output too bright/dull
- Fat Llama: lower threshold_value for more bite; raise it for smoother/cleaner output.
🔒 Licenses & Attribution
This repository (node code)
- © mrgattax/egregoralabs. Licensed under the MIT License (see
LICENSE
).
Weights & Third‑party projects (not included / auto‑downloaded)
- FlashSR_Inference and FlashSR weights belong to their respective authors. Check their repos / model cards for license & usage terms before commercial use.
- fat‑llama / fat‑llama‑fftw are third‑party packages. Refer to their PyPI/GitHub pages for license details.
⚠️ You are responsible for ensuring you have the rights to use the models/weights in your context (personal, research, commercial). When in doubt, review upstream licenses and model cards.
Acknowledgements
- FlashSR authors & contributors.
- Fat Llama authors & contributors.
- The ComfyUI community.
🧪 Dev Notes
-
Env vars:
EGREGORA_MODELS_DIR
→ override models root (default:ComfyUI/models
).EGREGORA_FLASHSR_REPO
→ path to a local FlashSR repo (if not using the internaldeps/
).
-
The FlashSR node writes sidecar WAV/FLAC files to
output/audio/
automatically; you can still place a dedicated Save‑Audio node if that suits your workflow.
🤝 Contributing
PRs and issues are welcome! Please include:
- OS / Python / ComfyUI version
- Repro steps and a short audio sample (if possible)
📜 Changelog
- v0.1.0 — Initial release: FlashSR SR node, Fat Llama GPU/CPU.