ComfyUI Extension: Canary-ComfyUI
This node pack integrates the core capabilities of the Canary-1b-v2 model, providing three main features: it can transcribe audio in any of 25 supported languages into text in the same language, translate audio from 24 source languages directly into English, and translate English audio directly into one of the 24 other supported languages.
Custom Nodes (0)
README
Canary-ComfyUI
NVIDIA’s Canary is a state-of-the-art multilingual speech-to-text and speech-translation model (ASR + AST) offering punctuation and capitalization this ComfyUI custom node supports :
- canary-1b-v2,
- canary-1b-flash,
- canary-180m-flash.
Installation
Follow these steps to install and configure the nodes.
1. Clone the Repository
Navigate to your ComfyUI custom_nodes
directory and clone this repository:
# Example path: ComfyUI/custom_nodes/
cd /path/to/your/ComfyUI/custom_nodes/
git https://github.com/Juste-Leo2/Canary-ComfyUI.git
cd Canary-ComfyUI
2. Install Dependencies
Python environments within ComfyUI can be tricky. The recommended way to install the required nemo_toolkit
is by using uv
, which is included with recent versions of ComfyUI.
Open a terminal or command prompt and run the following command. You must replace path/to/your/python.exe
with the actual path to the Python executable used by ComfyUI.
- For the portable version of ComfyUI, this is typically
ComfyUI/python_embeded/python.exe
. - If you use a virtual environment (venv), activate it and use
python
.
# Command to run from the root of the Canary-ComfyUI folder
# (ComfyUI/custom_nodes/Canary-ComfyUI)
/path/to/your/python.exe -m uv pip install -r requirements.txt --no-deps --force-reinstall --index-strategy unsafe-best-match
This command uses uv
to install NeMo in a way that is less likely to cause conflicts with ComfyUI's existing packages.
3. Download the Model
- Go to the model's files page on Hugging Face: nvidia/canary-1b-v2
- Download the model file, which is named
canary-1b-v2.nemo
.
4. Place the Model
- Place the downloaded
canary-1b-v2.nemo
file inside theComfyUI/models/canary/
directory. - You may need to create the
canary
folder yourself if it doesn't exist.
The final path should look like this: ComfyUI/models/canary/canary-1b-v2.nemo
.
5. Restart ComfyUI
Restart ComfyUI completely. The new nodes should appear in the "Add Node" menu under the Canary-ComfyUI
category.
Usage
- Add the
Load Canary Model
node and selectcanary-1b-v2.nemo
. - Add an audio loading node (e.g.,
Load Audio
) - Connect the
CANARY_MODEL
andAUDIO
outputs to one of the three task nodes (Canary Transcription
,Canary Translate to English
, orCanary Translate from English
). - Select the desired languages and queue the prompt. The resulting text will be available as an output.
Roadmap
Here are some of the features and improvements planned for the future of this project:
- [ ] Timestamp Support
- [x] Support for main SOTA Canary Models
- [ ] Support canary-1b
- [ ] nodes fusion for simplified use
License
- The Python code in this repository is released under the Apache 2.0 License.
- The NVIDIA Canary-1b-v2 model is subject to its own license, the Creative Commons Attribution-NonCommercial 4.0 International. Please review its terms before use, especially regarding commercial applications.
Acknowledgements
- A big thank you to NVIDIA for creating and open-sourcing the Canary model.
- Thanks to the entire ComfyUI team for building such a flexible and powerful tool for the community.