This is an audio processing script that applies soft limiting, optional loudness normalization, and optional slicing for transcription. It can also produce stereo-mixed outputs with optional audio appended to the end. The script organizes processed files into structured folders with sanitized filenames and retains original timestamps for continuity.
preFapMix.py is an audio processing script that applies soft limiting, optional loudness normalization, and optional slicing for transcription. It can also produce stereo-mixed outputs with optional audio appended to the end. The script organizes processed files into structured folders with sanitized filenames and retains original timestamps for continuity.
tones.wav
) to the end of stereo-mixed audio.pip install pydub
sudo apt-get install ffmpeg
Run the script from the command line with the following arguments:
python preFapMix.py --input-dir <input_directory> --output-dir <output_directory> [options]
--input-dir
: Directory containing input audio files (required).--output-dir
: Directory where processed files will be saved (required).--transcribe
: Enables transcription for both left and right channels. Implies both --transcribe_left
and --transcribe_right
.--transcribe_left
: Enables transcription only for the left channel.--transcribe_right
: Enables transcription only for the right channel.--normalize
: Enables loudness normalization on the audio.--tones
: Appends the contents of tones.wav
to the end of each stereo output file.--num-workers
: Specifies the number of workers to use for transcription (default is 2).Pre-Processing:
--normalize
is enabled, normalizes loudness to -23 LUFS for consistency.Conditional Slicing and Transcription:
--transcribe
is enabled, slices audio files to smaller segments and transcribes each segment, generating .lab
files.--transcribe_left
or --transcribe_right
, transcribes only files in the left or right folders, respectively.Stereo Mixing with Optional Tone Appending:
--tones
is enabled, appends tones.wav
to the end of each stereo file.File Naming and Organization:
.lab
file.The output structure is organized within <output_directory>/run_<timestamp>
as follows:
normalized/
: Contains normalized versions of the input audio files.left/
and right/
: Contains sliced (and optionally transcribed) audio files in respective left and right channel folders.stereo/
: Contains stereo-mixed files with optional tone appended to the end.transcribed-and-sliced/
:
.lab
files for each original input.left/
and right/
: Contains subfolders of sliced audio files and corresponding .lab
files.python preFapMix.py --input-dir ./my_audio_files --output-dir ./processed_audio --transcribe --normalize --tones --num-workers 3
This command will:
./my_audio_files
with soft limiting and loudness normalization.tones.wav
to the end of each stereo output.This project provides an end-to-end audio processing pipeline to automate the extraction, separation, slicing, transcription, and renaming of audio files. The resulting files are saved in a structured output directory with cleaned filenames and optional ZIP archives for easier distribution or storage.
pip install yt-dlp
fap
) should be installed and available in the PATH.fap
)Clone the Fish Audio Preprocessor repository:
git clone https://github.com/fishaudio/audio-preprocess.git
Navigate to the repository directory:
cd audio-preprocess
Install the package from the cloned repository:
pip install -e .
This step installs fap
and makes it accessible as a command-line tool, which is essential for fapMixPlus.py
to function correctly.
fap --version
| Argument | Description |
|-----------------|----------------------------------------------------------------------|
| --url
| URL of the audio source (YouTube or other supported link). |
| --output_dir
| Directory for saving all outputs. Default is output/
. |
| input_dir
| Path to a local directory of input files (optional if --url
used). |
python fapMixPlus.py --url https://youtu.be/example_video --output_dir my_output
This command will download the audio from the URL, process it, and save the results in the my_output
folder.
The output directory will contain a timestamped folder with the following structure:
output_<timestamp>/
├── wav_conversion/ # WAV-converted audio files
├── separation_output/ # Separated vocal track files
├── slicing_output/ # Sliced segments from separated audio
├── final_output/ # Final, sanitized, and renamed .wav and .lab files
├── zip_files/ # Compressed ZIP archives of processed files
In addition to organizing output files by processing stages, fapMixPlus
can generate ZIP archives for convenience. Each ZIP file in the zip_files/
directory will contain a set of processed audio and transcription files, with names based on their content and timestamp. The ZIP filenames will follow this format:
output_<timestamp>.zip
Each ZIP file will include:
.lab
files from final_output/
, with sanitized filenames..m4a
format.fap to-wav
.fap separate
.fap transcribe
to transcribe each slice..lab
file..lab
file..wav
and .lab
files are compressed into ZIP archives in zip_files/
for each session, making it easy to organize or share the output.Final output files in final_output
will be structured like:
0001_Hello_this_is_a_sample_transcription.wav
0001_Hello_this_is_a_sample_transcription.lab
Files without usable .lab
content will retain the numerical prefix, e.g., 0002.wav
and 0002.lab
.