ComfyUI Extension: ComfyUI_CSM

Authored by billwuhao

Created

Updated

3 stars

ComfyUI node of Conversational Speech Model (CSM).

Custom Nodes (0)

    README

    δΈ­ζ–‡|English

    ComfyUI Node for CSM

    CSM (Conversational Speech Model), a model that supports multi-person conversations, voice cloning, and generates speech with corresponding emotional changes based on the emotional changes in the conversation. Unfortunately, it is currently only available in English. This node temporarily supports simultaneous conversations with up to 10 people.

    Recording nodes can be interspersed within to create multi-person conversations.

    It also supports audio watermark detection (automatic watermark detection) and audio adding encrypted watermarks.

    πŸ“£ Updates

    [2025-03-18]βš’οΈ: Released version v1.0.0.

    Detailed node usage example workflows: example_workflows

    The prompt must be preceded by a number from 0~9, and separated by a : or :. Prompts and audio need to correspond one-to-one, for example, prompt1 corresponds to audio1.

    Installation

    cd ComfyUI/custom_nodes
    git clone https://github.com/billwuhao/ComfyUI_CSM.git
    cd ComfyUI_CSM
    pip install -r requirements.txt
    
    # python_embeded
    ./python_embeded/python.exe -m pip install -r requirements.txt
    

    Model Download

    • csm-1b: Download config.json and model.safetensors and place them in the ComfyUI/models/TTS/csm-1b directory.

    • moshiko-pytorch-bf16: Download tokenizer-e351c8d8-checkpoint125.safetensors and place it in the ComfyUI/models/TTS/moshiko-pytorch-bf16 directory.

    • SilentCipher: Download all models and place them in the ComfyUI\models\TTS\SilentCipher\44_1_khz\73999_iteration directory.

    • Llama-3.2-1B: Download everything except the original directory and place it in the ComfyUI\models\LLM\Llama-3.2-1B directory.

    Acknowledgements

    csm

    Thanks to the SesameAILabs team for their excellent work πŸ‘.