ComfyUI Extension: ComfyUI_CSM
ComfyUI node of Conversational Speech Model (CSM).
Custom Nodes (0)
README
ComfyUI Node for CSM
CSM (Conversational Speech Model), a model that supports multi-person conversations, voice cloning, and generates speech with corresponding emotional changes based on the emotional changes in the conversation. Unfortunately, it is currently only available in English. This node temporarily supports simultaneous conversations with up to 10 people.
Recording nodes can be interspersed within to create multi-person conversations.
It also supports audio watermark detection (automatic watermark detection) and audio adding encrypted watermarks.
π£ Updates
[2025-03-18]βοΈ: Released version v1.0.0.
Detailed node usage example workflows: example_workflows
The prompt
must be preceded by a number from 0~9
, and separated by a :
or οΌ
. Prompts and audio need to correspond one-to-one, for example, prompt1
corresponds to audio1
.
Installation
cd ComfyUI/custom_nodes
git clone https://github.com/billwuhao/ComfyUI_CSM.git
cd ComfyUI_CSM
pip install -r requirements.txt
# python_embeded
./python_embeded/python.exe -m pip install -r requirements.txt
Model Download
-
csm-1b: Download
config.json
andmodel.safetensors
and place them in theComfyUI/models/TTS/csm-1b
directory. -
moshiko-pytorch-bf16: Download
tokenizer-e351c8d8-checkpoint125.safetensors
and place it in theComfyUI/models/TTS/moshiko-pytorch-bf16
directory. -
SilentCipher: Download all models and place them in the
ComfyUI\models\TTS\SilentCipher\44_1_khz\73999_iteration
directory. -
Llama-3.2-1B: Download everything except the
original
directory and place it in theComfyUI\models\LLM\Llama-3.2-1B
directory.
Acknowledgements
Thanks to the SesameAILabs team for their excellent work π.