ComfyUI Extension: ComfyUI-FapMixPlus
This is an audio processing script that applies soft limiting, optional loudness normalization, and optional slicing for transcription. It can also produce stereo-mixed outputs with optional audio appended to the end. The script organizes processed files into structured folders with sanitized filenames and retains original timestamps for continuity.
Custom Nodes (0)
README
preFapMix.py
preFapMix.py is an audio processing script that applies soft limiting, optional loudness normalization, and optional slicing for transcription. It can also produce stereo-mixed outputs with optional audio appended to the end. The script organizes processed files into structured folders with sanitized filenames and retains original timestamps for continuity.
Features
- Soft Limiting: Reduces loud peaks in audio to prevent clipping.
- Optional Loudness Normalization: Adjusts audio levels to achieve consistent loudness.
- Conditional Slicing and Transcription: Options to slice and transcribe files in the left or right channels separately, or both channels together.
- Stereo Mixing with Optional Tone Appending: Optionally appends a custom tone (
tones.wav) to the end of stereo-mixed audio. - Organized Output Structure: Outputs are saved in structured folders with sanitized filenames.
- Timestamp Preservation: Maintains the original timestamps for all output files.
Installation Requirements
- Python 3.x
- Pydub for audio processing
pip install pydub - FFmpeg: Required by Pydub for handling audio files
sudo apt-get install ffmpeg - fap: The transcription tool, assumed to be installed and accessible via the command line.
Usage
Command Line
Run the script from the command line with the following arguments:
python preFapMix.py --input-dir <input_directory> --output-dir <output_directory> [options]
Options
--input-dir: Directory containing input audio files (required).--output-dir: Directory where processed files will be saved (required).--transcribe: Enables transcription for both left and right channels. Implies both--transcribe_leftand--transcribe_right.--transcribe_left: Enables transcription only for the left channel.--transcribe_right: Enables transcription only for the right channel.--normalize: Enables loudness normalization on the audio.--tones: Appends the contents oftones.wavto the end of each stereo output file.--num-workers: Specifies the number of workers to use for transcription (default is 2).
Workflow
-
Pre-Processing:
- Applies a soft limiter at -6 dB to control peaks.
- If
--normalizeis enabled, normalizes loudness to -23 LUFS for consistency.
-
Conditional Slicing and Transcription:
- If
--transcribeis enabled, slices audio files to smaller segments and transcribes each segment, generating.labfiles. - With
--transcribe_leftor--transcribe_right, transcribes only files in the left or right folders, respectively.
- If
-
Stereo Mixing with Optional Tone Appending:
- Combines left and right channels into a stereo file.
- If
--tonesis enabled, appendstones.wavto the end of each stereo file.
-
File Naming and Organization:
- Names each sliced audio file with its original numeric name, followed by the first 12 words (or fewer) from its
.labfile. - All filenames are sanitized for UTF-8 compliance.
- Names each sliced audio file with its original numeric name, followed by the first 12 words (or fewer) from its
Output Structure
The output structure is organized within <output_directory>/run_<timestamp> as follows:
normalized/: Contains normalized versions of the input audio files.left/andright/: Contains sliced (and optionally transcribed) audio files in respective left and right channel folders.stereo/: Contains stereo-mixed files with optional tone appended to the end.transcribed-and-sliced/:- Root: Contains combined
.labfiles for each original input. left/andright/: Contains subfolders of sliced audio files and corresponding.labfiles.
- Root: Contains combined
Example Command
python preFapMix.py --input-dir ./my_audio_files --output-dir ./processed_audio --transcribe --normalize --tones --num-workers 3
This command will:
- Process the audio files in
./my_audio_fileswith soft limiting and loudness normalization. - Slice and transcribe each file in the left and right channels.
- Mix each pair of left and right channels into a stereo file and append
tones.wavto the end of each stereo output.
fapMixPlus
This project provides an end-to-end audio processing pipeline to automate the extraction, separation, slicing, transcription, and renaming of audio files. The resulting files are saved in a structured output directory with cleaned filenames and optional ZIP archives for easier distribution or storage.
Features
- Download Audio: Fetches audio files from a URL or uses local input files.
- Convert to WAV: Converts audio files to WAV format.
- Separate Vocals: Isolates vocal tracks from the WAV files.
- Slice Audio: Segments the separated vocal track for transcription.
- Transcribe: Generates transcriptions from audio slices.
- Sanitize and Rename Files: Creates sanitized filenames with a numerical prefix, limited to 128 characters.
- Generate ZIP Files: Compresses processed files into ZIP archives for easy storage and distribution.
Prerequisites
- Python 3.x
- Install required Python packages:
pip install yt-dlp - Fish Audio Preprocessor (
fap) should be installed and available in the PATH.
Installing the Fish Audio Preprocessor (fap)
-
Clone the Fish Audio Preprocessor repository:
git clone https://github.com/fishaudio/audio-preprocess.git -
Navigate to the repository directory:
cd audio-preprocess -
Install the package from the cloned repository:
pip install -e .
This step installs fap and makes it accessible as a command-line tool, which is essential for fapMixPlus.py to function correctly.
- Verify the installation by checking the version:
fap --version
Usage
Command-line Arguments
| Argument | Description |
|-----------------|----------------------------------------------------------------------|
| --url | URL of the audio source (YouTube or other supported link). |
| --output_dir | Directory for saving all outputs. Default is output/. |
| input_dir | Path to a local directory of input files (optional if --url used). |
Example Command
python fapMixPlus.py --url https://youtu.be/example_video --output_dir my_output
This command will download the audio from the URL, process it, and save the results in the my_output folder.
Output Structure
The output directory will contain a timestamped folder with the following structure:
output_<timestamp>/
├── wav_conversion/ # WAV-converted audio files
├── separation_output/ # Separated vocal track files
├── slicing_output/ # Sliced segments from separated audio
├── final_output/ # Final, sanitized, and renamed .wav and .lab files
├── zip_files/ # Compressed ZIP archives of processed files
ZIP File Details
In addition to organizing output files by processing stages, fapMixPlus can generate ZIP archives for convenience. Each ZIP file in the zip_files/ directory will contain a set of processed audio and transcription files, with names based on their content and timestamp. The ZIP filenames will follow this format:
output_<timestamp>.zip
Each ZIP file will include:
- The WAV and
.labfiles fromfinal_output/, with sanitized filenames. - These ZIP files are ideal for transferring or archiving processed audio.
Functionality Details
- Download Audio: Downloads audio from a URL, saving it in
.m4aformat. - WAV Conversion: Converts audio to WAV using
fap to-wav. - Separation: Separates vocals from the WAV files using
fap separate. - Slicing: Segments the separated vocal track into smaller audio slices.
- Transcription: Uses
fap transcribeto transcribe each slice. - Sanitization and Renaming:
- Extracts the first 10 words from each
.labfile. - Replaces spaces with underscores, removes special characters, and limits to 128 characters.
- Applies a numerical prefix if no valid content is in the
.labfile.
- Extracts the first 10 words from each
- ZIP File Creation:
- After processing, the final
.wavand.labfiles are compressed into ZIP archives inzip_files/for each session, making it easy to organize or share the output.
- After processing, the final
Example File Names in Final Output
Final output files in final_output will be structured like:
0001_Hello_this_is_a_sample_transcription.wav0001_Hello_this_is_a_sample_transcription.lab
Files without usable .lab content will retain the numerical prefix, e.g., 0002.wav and 0002.lab.