ComfyUI Extension: ComfyUI-JM-MiniMax-API
A collection of ComfyUI custom nodes that integrate with MiniMax API services.
Custom Nodes (0)
README
ComfyUI-JM-MiniMax-API
A collection of ComfyUI custom nodes that integrate with MiniMax API services.
Features
Speech Nodes (JM-MiniMax-API/Speech)
- Text to Speech: Convert text to natural-sounding speech using MiniMax's advanced text-to-speech API
- Voice Cloning: Clone voices from audio samples
- Voice Design: Generate custom voices from text descriptions using AI-powered voice design
- Load Audio: Load and preview audio files for voice cloning
Video Nodes (JM-MiniMax-API/Video)
- Video Generation: Generate videos using MiniMax's unified video generation API (supports text-to-video, image-to-video, and subject-referenced video)
- Check Video Status: Check the status of video generation tasks
- Download Video: Download generated videos to local storage
Installation
- Clone this repository to your ComfyUI custom_nodes folder:
cd ComfyUI/custom_nodes
git clone https://github.com/yourusername/ComfyUI-JM-MiniMax-API.git
- Install dependencies:
pip install -r requirements.txt
- Restart ComfyUI
Usage
Load Audio Node
This node allows you to load and preview audio files for voice cloning.
Input Parameters:
- audio_path: Select an existing audio file from the input directory
- Upload Button: Click to upload a new audio file (supports .mp3, .wav, .m4a)
Output:
- audio_file: Absolute path to the selected audio file
Voice Cloning Node
This node uses MiniMax's voice cloning API to clone voices from audio samples.
Input Parameters:
- api_key: MiniMax API key
- group_id: MiniMax group ID
- audio_file: Path to the audio file to clone (connect from Load Audio node)
- voice_id: ID for the cloned voice (at least 8 characters, start with a letter, include numbers)
- need_noise_reduction: Whether to apply noise reduction (true/false)
- need_volume_normalization: Whether to normalize volume (true/false)
- preview_text: Optional text to preview the cloned voice (max 300 characters)
- model: TTS model to use for preview
- accuracy: Cloning accuracy threshold (0.0-1.0)
Output:
- voice_id: ID of the cloned voice (can be connected to TextToSpeech node)
Voice Design Node
This node uses MiniMax's voice design API to generate custom voices from text descriptions. Simply describe the voice characteristics you want, and the AI will create a unique voice for you.
Input Parameters:
- api_key: MiniMax API key
- prompt: Detailed description of the desired voice characteristics
- Example: "讲述悬疑故事的播音员,声音低沉富有磁性,语速时快时慢,营造紧张神秘的氛围。"
- Include: gender, age, emotion, speaking style, tone, usage scenario, etc.
- preview_text: Optional text for voice preview (max 200 characters)
- Example: "夜深了,古屋里只有他一人。窗外传来若有若无的脚步声..."
- custom_voice_id (optional): Custom voice ID for the generated voice
- If empty, a unique voice ID will be automatically generated
- Format: Can be any string identifier you prefer
Output:
- voice_id: Generated or custom voice ID (can be used in Text to Speech node)
- trial_audio: Path to preview audio file (if generated)
Voice ID Generation:
- Custom ID: If you provide a custom_voice_id, it will be used as-is
- Auto-generated: If left empty, format will be
voice_{timestamp}_{unique_id}
- Example auto-generated:
voice_1704067200_a1b2c3d4
Usage Tips:
- Be Specific: Describe gender, age, emotion, speaking style in detail
- Include Context: Mention the usage scenario (narrator, customer service, etc.)
- Voice Characteristics: Specify tone (deep, bright), pace (fast, slow), style (formal, casual)
- Preview: Use preview_text to test the generated voice
- Custom ID: Use meaningful custom_voice_id for easier management
Example Prompts:
- "专业新闻主播,女性,30岁左右,声音清晰标准,语调平稳,适合播报新闻"
- "儿童故事讲述者,温和亲切的女声,语速适中,充满关爱和耐心"
- "商务客服代表,男性,声音稳重专业,语调友好,适合电话客服"
Text to Speech Node
This node uses MiniMax's API to convert text to speech.
Input Parameters:
- api_key: MiniMax API key
- group_id: MiniMax group ID
- text: Text to convert (up to 5000 characters)
- model: TTS model selection (speech-02-hd, speech-02-turbo, etc.)
- voice_id: Voice selection (multiple options available)
- custom_voice_id (optional): Custom voice ID from voice cloning (overrides voice_id if provided)
- speed: Speech rate (0.5 to 2.0)
- volume: Volume (0.1 to 10.0)
- pitch: Pitch adjustment (-12 to 12)
- emotion: Emotion selection (happy, sad, etc.)
- subtitle_enable: Whether to generate subtitle file (true/false)
- filename_prefix: Prefix for output filenames
- language_boost (optional): Language enhancement
Output:
- audio_path: Absolute path to the generated audio file
- subtitle_path: Absolute path to the generated subtitle file (if subtitles enabled)
Output File Details:
- Audio File (audio_path):
- Format: MP3
- Filename format: prefix_YYYYMMDD-HHMMSS.mp3
- Subtitle File (subtitle_path):
- Format: JSON
- Filename format: prefix_subtitle_YYYYMMDD-HHMMSS.json
- Contains sentence-level timestamps accurate to milliseconds
Workflow Examples
Basic Text-to-Speech Workflow:
- Use Text to Speech node with a predefined voice
- Enter your API key, text, and configure voice parameters
- Run to generate speech audio
Voice Cloning Workflow:
- Use Load Audio node to upload or select an audio file
- Connect the output to a Voice Cloning node to clone the voice
- Connect the voice_id output to a Text to Speech node's custom_voice_id input
- Enter text and configure other parameters in the Text to Speech node
- Run the workflow to generate speech using the cloned voice
Voice Design Workflow:
- Use Voice Design node to create a custom voice from description
- Write a detailed prompt describing the voice characteristics you want
- Optionally add preview text to test the voice
- Run to get a custom voice_id and preview audio
- Connect the voice_id to a Text to Speech node to use the custom voice
Video Generation Node
This node uses MiniMax's unified video generation API to create videos from text prompts, images, or subject references.
Input Parameters:
- api_key: MiniMax API key
- model: Video generation model selection:
- T2V-01-Director, T2V-01: Text-to-video models (require text prompt)
- I2V-01-Director, I2V-01, I2V-01-live: Image-to-video models (require image input)
- S2V-01: Subject-referenced video model
- prompt: Video generation description (max 2000 characters, supports camera movement instructions)
- prompt_optimizer: Whether to auto-optimize prompt for better quality (true/false)
- image (optional): Image input for I2V models (required for I2V-01-Director, I2V-01, I2V-01-live)
- callback_url (optional): URL for status update callbacks
Model Usage Guidelines:
- Text-to-Video (T2V models): Only requires a text prompt. Image input is optional.
- Image-to-Video (I2V models): Requires both image input and optionally a text prompt. If prompt is empty, the model will auto-generate video content based on the image.
- Subject-Referenced (S2V models): Currently S2V-01 model for subject-referenced video generation.
Camera Movement Instructions:
When using T2V-01-Director or I2V-01-Director models, you can use movement instructions in the prompt:
- Movement: [左移], [右移], [推进], [拉远], [上升], [下降]
- Rotation: [左摇], [右摇], [上摇], [下摇]
- Zoom: [变焦推近], [变焦拉远]
- Other: [晃动], [跟随], [固定]
Image Requirements (for I2V models):
- Format: JPG/JPEG/PNG
- Aspect ratio: between 2:5 and 5:2
- Short side: minimum 300px
- File size: maximum 20MB
Output:
- task_id: Task ID for the video generation job
Video Workflow Examples
Text-to-Video Workflow:
- Use Video Generation node with T2V-01-Director or T2V-01 model
- Enter your API key and write a detailed text prompt
- Run to get a task_id
- Use Check Video Status node to monitor progress
- Once status is "success", use Download Video node to save the video
Image-to-Video Workflow:
- Load an image into ComfyUI (using Load Image node)
- Connect it to the Video Generation node
- Select an I2V model (I2V-01-Director recommended)
- Enter API key and optionally write a prompt with movement instructions
- Run to get a task_id
- Use Check Video Status node to monitor progress (it will automatically wait until completion)
- Once status is "success", use Download Video node with the file_id to save the video
License
MIT License
Check Video Status Node
This node checks the status of video generation tasks and waits until completion.
Input Parameters:
- api_key: MiniMax API key
- task_id: Task ID from Video Generation node
- check_interval: Check interval in seconds (default: 30, range: 10-300)
- max_wait_time: Maximum wait time in seconds (default: 1800/30 minutes, range: 5 minutes-2 hours)
Output:
- status: Task status (processing, success, failed)
- file_id: File ID of the generated video (required for downloading)
- video_url: Download URL for the generated video (may be empty)
- cover_image_url: URL for the video cover image
Features:
- Automatic polling: Continuously checks status until completion or failure
- Progress tracking: Shows elapsed time, remaining time, and attempt count
- Timeout protection: Prevents infinite waiting with configurable maximum wait time
- Smart intervals: Customizable check intervals to balance responsiveness and API usage
Download Video Node
This node downloads generated videos using the file_id from the status check.
Input Parameters:
- api_key: MiniMax API key
- file_id: File ID from Check Video Status node
- filename_prefix: Prefix for the downloaded video file
Output:
- video_path: Absolute path to the downloaded video file
Process:
- Uses file_id to retrieve download URL from MiniMax file API
- Downloads the video file with progress tracking
- Saves to ComfyUI output directory with timestamp