A centralized wrapper of all MediaPipe vision tasks for ComfyUI.
This repository aims to provide a complete and centralized implementation of all MediaPipe vision tasks, optimized for real-time use in ComfyUI.
These tools can be used for interactive AI art, responsive interfaces, motion tracking, advanced masking workflows, and more. These are well optimizede for real-time usage (with comfystream), but are blazing fast for normal batch processing as well.
| Category | Available Tools | |----------|-------------| | Face Analysis | Face detection, face mesh (478 points), blendshapes, head pose | | Body Tracking | Pose estimation (33 landmarks), segmentation masks | | Hand Analysis | Hand tracking (21 landmarks per hand), gesture recognition | | Image Processing | Object detection, image segmentation, image embeddings | | Creative Tools | Face stylization, interactive segmentation | | Control Nodes | Use deltas from tracking landmarks to control other Comfy nodes |
Note: Holistic landmark detection is currently using legacy API as we await official Tasks API release.
The extension organizes MediaPipe functionality into these components:
Control nodes convert MediaPipe landmark tracking into ComfyUI parameters, enabling dynamic control of your workflows:
| Feature | Control Types | Example Applications | |---------|--------------|---------------------| | Face | Head Pose (yaw/pitch/roll/position), Blendshape expressions | Camera control, gaze-directed generation, emotion-based parameters | | Hand | Landmark delta tracking, finger position/movement | UI control, gesture-based adjustments, pinch-to-zoom effects | | Pose | Body landmark movement, joint tracking | Animation control, posture-based parameters |
The Head Pose Control nodes provide these specific controls:
MediaPipe's face landmark detection includes blendshape coefficients that can be used to control parameters based on facial expressions. There are ~40 expression attributes that can be used. Each expression can be mapped to INT/FLOAT outputs for precise control over generation parameters, or used as triggers for workflow events.
Check out the examples directory for sample workflows demonstrating how to use the control nodes with different MediaPipe features.
Use the ComfyUI-Manager, or....
# Navigate to your ComfyUI custom_nodes directory
cd ComfyUI/custom_nodes
# Clone the repository
git clone https://github.com/your-username/ComfyUI-MediaPipe-Vision.git
# Enter the directory
cd ComfyUI-MediaPipe-Vision
# Install dependencies
pip install -r requirements.txt
# Restart ComfyUI
Note: GPU Support varies by platform. Generally, for Linux platforms, you can reference these instructions to enable GPU support.
Load ... Model (MediaPipe)
node for your task[Load Face Landmarker Model] → model_info → [Face Landmarker] ← image
|
↓ landmarks
[Visualize Face Landmarks] ← original_image
|
↓ visualization
[Preview]
[Load Hand Landmarker Model] → model_info → [Hand Landmarker] ← webcam_image
|
↓ landmarks
[Hand Landmark Delta Float Control] (index_finger_tip)
|
↓ float_value
[Any Comfy Parameter]
ComfyUI/models/mediapipe/<task_type>/
Contributions are welcome! For bugs or suggestions for improvements, please open an issue or submit a pull request.
Feature Requests Strongly Encouraged! This project provides a flexible infrastructure that can be adapted to many different use cases. While several basic capabilities are implemented, the project aims to address more use cases and problems:
Please open an issue to discuss your needs even if you're not sure how to implement them. The MediaPipe framework is powerful and extensible, and this project aims to make that power accessible within ComfyUI for any computer vision application.