ComfyUI Extension: ComfyUI-Rex-Omni
A powerful ComfyUI custom node for integrating Rex-Omni multimodal AI models. This node provides seamless integration of Rex-Omni's advanced computer vision and natural language processing capabilities into your ComfyUI workflows.
Custom Nodes (0)
README
ComfyUI-Rex-Omni
β οΈ Important Notice: Since I'm not very familiar with coding, this plugin was written by Claude. Some features are working, but there are still other issues that need to be fixed. If you encounter any problems, please submit an issue.
A powerful ComfyUI custom node for integrating Rex-Omni multimodal AI models. This node provides seamless integration of Rex-Omni's advanced computer vision and natural language processing capabilities into your ComfyUI workflows.
π Features
- Multimodal AI Integration: Seamlessly integrate Rex-Omni models into ComfyUI workflows
- Multiple Task Support: Object detection, text recognition, keypoint detection, and more
- Flexible Backend Options: Support for both Transformers and vLLM backends
- Real-time Visualization: Built-in visualization capabilities for detection results
- JSON Output: Structured JSON output for easy integration with other nodes
- Batch Processing: Efficient batch processing of multiple images
- Customizable Parameters: Fine-tune model behavior with temperature, max tokens, and more
π Requirements
- Python 3.8+
- ComfyUI
- PyTorch
- PIL (Pillow)
- NumPy
- Rex-Omni models
π Installation
Method 1: Git Clone (Recommended)
cd ComfyUI/custom_nodes/
git clone https://github.com/flybirdxx/ComfyUI-RexOmni.git
Method 2: Manual Installation
- Download the repository as a ZIP file
- Extract it to
ComfyUI/custom_nodes/ComfyUI-Rex-Omni/
- Ensure all dependencies are installed
Model Download
Before using this plugin, you need to download the Rex-Omni model:
Model URL: https://huggingface.co/IDEA-Research/Rex-Omni
Download Path: models/Rex-Omni/
Download Methods:
- Using Hugging Face CLI (Recommended):
# Install huggingface_hub
pip install huggingface_hub
# Download model to specified path
huggingface-cli download IDEA-Research/Rex-Omni --local-dir models/Rex-Omni
- Using Git LFS:
git lfs install
git clone https://huggingface.co/IDEA-Research/Rex-Omni models/Rex-Omni
- Manual Download:
- Visit the Hugging Face model page
- Download all files to the
models/Rex-Omni/
directory
Dependencies Installation
pip install torch torchvision pillow numpy huggingface_hub
π Usage
Node Overview
This custom node provides two main components:
- Rex-Omni Loader: Loads and configures Rex-Omni models
- Rex-Omni Detector: Performs inference and returns results
Basic Workflow
- Load Model: Use the
Rex-Omni Loader
node to load your desired Rex-Omni model - Process Image: Connect your image to the
Rex-Omni Detector
node - Configure Parameters: Set task type, confidence threshold, and other parameters
- Get Results: Receive detection results, visualizations, and JSON data
Node Parameters
Rex-Omni Loader
- Model Name: Select from available Rex-Omni models
- Backend: Choose between "transformers" or "vllm"
- Max Tokens: Maximum number of tokens for text generation (1-4096)
- Temperature: Sampling temperature for text generation (0.0-2.0)
Rex-Omni Detector
- Model: Connect the loaded model from Rex-Omni Loader
- Image: Input image for processing
- Task Type: Type of task to perform (detection, recognition, etc.)
- Confidence Threshold: Minimum confidence for detections (0.0-1.0)
- Max Results: Maximum number of results to return
- Show Labels: Whether to show labels in visualization
- Show Confidence: Whether to show confidence scores in visualization
Output Types
The detector node provides multiple output types:
- Detection Results: Structured detection data
- Visualization Image: Annotated image with detections
- JSON Data: Formatted JSON output for further processing
- Bounding Boxes: Extracted bounding box coordinates
- Text Results: Extracted text content
- Keypoints: Detected keypoint coordinates
π― Supported Tasks
- Object Detection: Detect and localize objects in images
- Text Recognition: Extract and recognize text from images
- Keypoint Detection: Detect human pose keypoints
- Image Classification: Classify images into categories
- Visual Question Answering: Answer questions about images
- Image Captioning: Generate descriptive captions for images
π Example Workflow
The following workflow demonstrates the Rex-Omni nodes in action, showcasing multiple computer vision tasks:
This example workflow shows:
- Object Detection: Detecting people in images with bounding boxes
- Localization: Precise pointing and localization tasks
- Keypoint Detection: Detailed human pose keypoint extraction
- OCR: Text recognition with polygon-based word detection
π Troubleshooting
Common Issues
- Model Not Found: Ensure models are placed in the correct directory
- CUDA Out of Memory: Reduce batch size or use CPU backend
- Import Errors: Check that all dependencies are installed correctly
π€ Contributing
We welcome contributions! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- ComfyUI for the amazing framework
- Rex-Omni for the multimodal AI models
- IDEA-Research team for developing the Rex-Omni model
- The open-source community for inspiration and support
π Support
If you encounter any issues or have questions:
- Check the Issues page
- Create a new issue with detailed information
- Join our community discussions
Note: This is a custom node for ComfyUI. Make sure you have ComfyUI properly installed and configured before using this node.