This repository provides custom nodes for ComfyUI designed to process audio files, performing speaker diarization and integrating speaker data into whisper-transcribed segments. These nodes utilize the PyAnnote library for speaker identification and pandas for efficient data handling.
This repository provides custom nodes for ComfyUI designed to process audio files, performing speaker diarization and integrating speaker data into whisper-transcribed segments. These nodes utilize the PyAnnote library for speaker identification and pandas for efficient data handling.
Description:
Performs speaker diarization on an input audio file, identifying and segmenting speech by different speakers.
Speaker Diarization
Inputs:
AUDIO
)STRING
)Outputs:
List[Dict]
)start
, end
, and speaker
information.Usage Example:
After connecting an audio input, this node identifies where each speaker starts and ends in the audio file.
Description:
Adds speaker labels to segments generated by a Whisper speech-to-text model, aligning them based on time overlaps with diarized segments.
Whisper Segments to Speaker
Inputs:
whisper_alignment
)start
and end
times.speaker_segments
)Speaker Diarization Node
, providing time-aligned speaker information.Outputs:
whisper_alignment
)Usage Example:
Connect the output of the Speaker Diarization Node
and Whisper-transcribed segments. This node will align speaker data, allowing for a detailed transcription with speaker differentiation.
Ensure the following libraries are installed:
Install dependencies using:
pip install pyannote.audio pandas numpy torchaudio pydub
Place Node Files:
Add the provided Python file to the custom nodes directory in your ComfyUI setup.
Register the Nodes:
ComfyUI automatically detects nodes from the NODE_CLASS_MAPPINGS
. Ensure the structure includes:
NODE_CLASS_MAPPINGS = {
"Speaker Diarization": SpeakerDiarizationNode,
"Whisper Segments to Speaker": WhisperDiarizationNode
}
Node Interface:
In the ComfyUI interface, locate the nodes under the Audio Processing category. Connect them as needed to process audio inputs and transcriptions.
To access the PyAnnote model, you need a Hugging Face token. Sign up or log in to Hugging Face and generate an access token from your account settings.
This project is open-source. Refer to the LICENSE file for details.
For issues or feature requests, please submit them via the repository's issue tracker.
Happy audio processing! 🎧