ComfyUI node of Conversational Speech Model (CSM).
CSM (Conversational Speech Model), a model that supports multi-person conversations, voice cloning, and generates speech with corresponding emotional changes based on the emotional changes in the conversation. Unfortunately, it is currently only available in English. This node temporarily supports simultaneous conversations with up to 10 people.
Recording nodes can be interspersed within to create multi-person conversations.
It also supports audio watermark detection (automatic watermark detection) and audio adding encrypted watermarks.
[2025-03-18]βοΈ: Released version v1.0.0.
Detailed node usage example workflows: example_workflows
The prompt
must be preceded by a number from 0~9
, and separated by a :
or οΌ
. Prompts and audio need to correspond one-to-one, for example, prompt1
corresponds to audio1
.
cd ComfyUI/custom_nodes
git clone https://github.com/billwuhao/ComfyUI_CSM.git
cd ComfyUI_CSM
pip install -r requirements.txt
# python_embeded
./python_embeded/python.exe -m pip install -r requirements.txt
csm-1b: Download config.json
and model.safetensors
and place them in the ComfyUI/models/TTS/csm-1b
directory.
moshiko-pytorch-bf16: Download tokenizer-e351c8d8-checkpoint125.safetensors
and place it in the ComfyUI/models/TTS/moshiko-pytorch-bf16
directory.
SilentCipher: Download all models and place them in the ComfyUI\models\TTS\SilentCipher\44_1_khz\73999_iteration
directory.
Llama-3.2-1B: Download everything except the original
directory and place it in the ComfyUI\models\LLM\Llama-3.2-1B
directory.
Thanks to the SesameAILabs team for their excellent work π.