ComfyUI Extension: Canary-ComfyUI
This node pack integrates the core capabilities of the Canary-1b-v2 model, providing three main features: it can transcribe audio in any of 25 supported languages into text in the same language, translate audio from 24 source languages directly into English, and translate English audio directly into one of the 24 other supported languages.
Custom Nodes (0)
README
Canary-ComfyUI
NVIDIA’s Canary is a state-of-the-art multilingual speech-to-text and speech-translation model (ASR + AST) offering punctuation and capitalization this ComfyUI custom node supports :
- canary-1b-v2,
 - canary-1b-flash,
 - canary-180m-flash.
 
Installation
Follow these steps to install and configure the nodes.
1. Clone the Repository
Navigate to your ComfyUI custom_nodes directory and clone this repository:
# Example path: ComfyUI/custom_nodes/
cd /path/to/your/ComfyUI/custom_nodes/
git https://github.com/Juste-Leo2/Canary-ComfyUI.git
cd Canary-ComfyUI
2. Install Dependencies
Python environments within ComfyUI can be tricky. The recommended way to install the required nemo_toolkit is by using uv, which is included with recent versions of ComfyUI.
Open a terminal or command prompt and run the following command. You must replace path/to/your/python.exe with the actual path to the Python executable used by ComfyUI.
- For the portable version of ComfyUI, this is typically 
ComfyUI/python_embeded/python.exe. - If you use a virtual environment (venv), activate it and use 
python. 
# Command to run from the root of the Canary-ComfyUI folder
# (ComfyUI/custom_nodes/Canary-ComfyUI)
/path/to/your/python.exe -m uv pip install -r requirements.txt --no-deps --force-reinstall --index-strategy unsafe-best-match
This command uses uv to install NeMo in a way that is less likely to cause conflicts with ComfyUI's existing packages.
3. Download the Model
- Go to the model's files page on Hugging Face: nvidia/canary-1b-v2
 - Download the model file, which is named 
canary-1b-v2.nemo. 
4. Place the Model
- Place the downloaded 
canary-1b-v2.nemofile inside theComfyUI/models/canary/directory. - You may need to create the 
canaryfolder yourself if it doesn't exist. 
The final path should look like this: ComfyUI/models/canary/canary-1b-v2.nemo.
5. Restart ComfyUI
Restart ComfyUI completely. The new nodes should appear in the "Add Node" menu under the Canary-ComfyUI category.
Usage
- Add the 
Load Canary Modelnode and selectcanary-1b-v2.nemo. - Add an audio loading node (e.g., 
Load Audio) - Connect the 
CANARY_MODELandAUDIOoutputs to one of the three task nodes (Canary Transcription,Canary Translate to English, orCanary Translate from English). - Select the desired languages and queue the prompt. The resulting text will be available as an output.
 
Roadmap
Here are some of the features and improvements planned for the future of this project:
- [ ] Timestamp Support
 - [x] Support for main SOTA Canary Models
 - [ ] Support canary-1b
 - [ ] nodes fusion for simplified use
 
License
- The Python code in this repository is released under the Apache 2.0 License.
 - The NVIDIA Canary-1b-v2 model is subject to its own license, the Creative Commons Attribution-NonCommercial 4.0 International. Please review its terms before use, especially regarding commercial applications.
 
Acknowledgements
- A big thank you to NVIDIA for creating and open-sourcing the Canary model.
 - Thanks to the entire ComfyUI team for building such a flexible and powerful tool for the community.