ComfyUI Extension: ComfyUI Custom Dia

Authored by nobrainX2

Created

Updated

7 stars

This is a ComfyUI integration of the a/Dia TTS model. Many thanks to nari-labs for their fantastic work.

Custom Nodes (0)

    README

    ComfyUI Custom Dia

    This is a ComfyUI integration of the Dia TTS model.
    Many thanks to nari-labs for their fantastic work.

    Installation

    Download the .pth and .json files from Hugging Face
    Store them in any subfolder under /models/ — the path is not hardcoded, and the node allows you to define it manually.
    (Default path: /models/Dia/dia-v0_1.pth)

    Modifications from the Original Repository

    The original Dia API has been slightly modified to support multi-channel audio inputs.
    This allows for stereo files or tensors provided directly by ComfyUI nodes.

    an extra node has been added to retime the output audio. See the example for usage.

    plase note that the pitch preservation option requires the librosa package. It's not in requirements.txt because it's optionnal.

    Usage

    This is an output node, meaning it can be used standalone and queued without connections.
    In that case, you may want to enable save_audio_file to automatically save the result into ComfyUI’s output folder.

    To use it in a pipeline, just connect the audio output to any compatible node.

    Speech Prompt

    • Use the text field to define your dialogue, e.g.:
    [S1] Hello.
    [S2] Hi there! (laughs)
    
    • Use [S1], [S2], etc. to switch speakers.
    • Insert nonverbal tags (e.g. (laughs), (sighs)) to enrich the audio.
    • A list of available tags is provided in the third (inactive) text field.

    image

    Voice Cloning

    You can plug an audio tensor as input to enable voice cloning.
    In this case, it is strongly recommended to provide a transcript of the input audio in the input_audio_transcript field to improve results.

    image

    Troubleshooting and side effects

    As stated in the requirement.txt file, you will have to install 2 python packages: descript-audio-codec and soundfile

    Under certain circonstances, descript-audio-codec installation could auomatically downgrade protobuf back into 3.19.6 which could make some other nodes crash on startup. If it ever happens, just upgrade protobuf by opening comfyUI terminal and run

    pip install protobuf --upgrade