A ComfyUI custom node that integrates Mistral AI's Pixtral Large vision model, enabling powerful multimodal AI capabilities within ComfyUI. Pixtral Large is a 124B parameter model (123B decoder + 1B vision encoder) that can analyze up to 30 high-resolution images simultaneously.
A ComfyUI custom node that integrates Mistral AI's Pixtral Large vision model, enabling powerful multimodal AI capabilities within ComfyUI. Pixtral Large is a 124B parameter model (123B decoder + 1B vision encoder) that can analyze up to 30 high-resolution images simultaneously.
🖼️ Process up to 30 high-resolution images in a single request
🧠 Leverages Pixtral Large's 124B parameter architecture
📝 Generate detailed descriptions and analysis of images
📊 Support for documents, charts, and natural images
🌐 128K context window for extensive image processing
🔤 Multilingual capabilities including:
📚 Advanced OCR in multiple languages and scripts
🛠️ Customizable parameters for fine-tuned responses
cd ComfyUI/custom_nodes
https://github.com/ShmuelRonen/ComfyUI_pixtral_large.git
The extension adds three powerful nodes to ComfyUI:
Main node for image analysis using Pixtral Large.
Parameters:
prompt
: Your query about the image(s) - can be in any supported languageimages
: Input images to analyzeapi_key
: Your Mistral AI API keytemperature
: Response randomness (0.0-1.5)maximum_tokens
: Max response length (1-32768)top_p
: Nucleus sampling parameter (0.0-1.0)Use Cases:
Specialized node for combining multiple images into a batch for analysis.
Parameters:
inputcount
: Number of image inputs (2-30)Features:
Use Cases:
Advanced text output display node for viewing Pixtral Large results.
Parameters:
text
: Input text to display (automatically connected to Pixtral Large output)Features:
Use Cases:
graph LR
A[Load Image] --> B[Pixtral Large]
B --> C[Preview Text]
graph LR
A[Load Image 1] --> C[Multi Images Input]
B[Load Image 2] --> C
C --> D[Pixtral Large]
D --> E[Preview Text]
graph LR
A[Load Image 1] --> D[Multi Images Input]
B[Load Image 2] --> D
C[Load Image 3] --> D
D --> E[Pixtral Large]
E --> F[Preview Text]
Pixtral Large offers robust multilingual support for both input and output:
# Hebrew prompt example
prompt = "תאר את התמונה בעברית"
# Mixed language example
prompt = "Analyze this image and provide the response in Hebrew (עברית)"
api_key
fieldCommon error messages and solutions:
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
If you encounter any issues or have questions: