ComfyUI Extension: IF_DatasetMkr

Authored by if-ai

Created 5 months ago

Updated 4 months ago

20 stars

Create Video datasets straight from YT or a video file path

Custom Nodes (3)

README

ComfyUI-IF_VideoDatasetMaker

An advanced ComfyUI node that converts video files or YouTube links into well-structured training datasets for AI image generation models. This tool is based on the work done by zsxkib in their repository cog-create-video-dataset.

Overview

The IF_VideoDatasetMaker node allows you to:

Download YouTube videos or use local video files
Automatically segment videos into high-quality clips
Generate intelligent captions using Qwen-VL models that describe the visual content
Create a training-ready dataset with the proper file structure
Customize trigger words and captions to fit your specific needs

Perfect for creating training datasets for HyperNetworks, LoRAs, Dreambooth or other fine-tuning approaches.

Features

Multi-source Input: Process YouTube links or local video files
Intelligent Scene Detection: Automatically extract the best quality clips based on content changes
AI-powered Captioning: Generate detailed, useful captions with multimodal AI
Customizable Output: Control caption prefixes, suffixes, and triggers
Structured Organization: Creates proper dataset structure for immediate training use
Automatic Compression: Generates a ready-to-share ZIP file of your dataset
Debugging Options: Additional debug information to help troubleshoot issues

Installation

Prerequisites

ComfyUI installed and working
Python 3.8+ environment
FFmpeg installed on your system (required for video processing)

Installing the Node

Clone this repository to your ComfyUI custom_nodes directory:

cd ./custom_nodes
git clone https://github.com/if-ai/ComfyUI-IF_DatasetMkr.git

Install the required dependencies:

cd ./ComfyUI-IF_DatasetMkr.
pip install -r requirements.txt

If you want to use AWQ to save VRAM and up to 3x faster inference you need to install triton and autoawq

pip install triton
pip install --no-deps --no-build-isolation autoawq

Restart ComfyUI

Dependencies

This node requires several packages for video processing, AI captioning, and file management:

yt-dlp - For YouTube video downloading
scenedetect - For intelligent clip extraction
python-slugify - For filename normalization
moviepy - For video editing and frame extraction
opencv-python - For image processing
qwen-vl-utils - For multimodal AI caption generation
ffmpeg-python - For advanced video processing
Hugging Face Transformers - For running the Qwen-VL models

Usage

Add the IF_VideoDatasetMkr node to your ComfyUI workflow
Configure the node with your desired settings:
- Provide a YouTube URL or local video file path
- Choose your captioning model
- Set custom trigger words if needed
- Select output directory and options
Run the workflow to generate your dataset
Use the resulting dataset ZIP for training your models

Node Parameters

| Parameter | Description | |-----------|-------------| | video_url | YouTube/video URL to process | | video_file | Local video file path to process | | trigger_word | Custom trigger word (optional) | | autocaption | Enable/disable AI captioning | | custom_caption | Static caption to use for all frames | | autocaption_prefix | Text to add before all generated captions | | autocaption_suffix | Text to add after all generated captions | | output_dir | Custom output directory (defaults to ComfyUI output folder) | | model_variant | Qwen-VL model to use for captioning | | model_offload | Toggle CPU offloading to save VRAM | | hf_token | Hugging Face token for downloading models | | profile | Captioning profile/persona to use | | image_size | Resolution for processing frames | | debug_mode | Enable additional debugging information |

Output

The node generates a structured dataset with:

A videos folder containing all extracted clips
A captions folder with text files matching each clip's name
Caption files following the naming convention: {trigger_word}_{clip_number}.txt
A compressed ZIP archive for easy sharing

Customization

Caption Profiles

The node comes with built-in caption profiles that control the style and content of generated descriptions. You can edit these profiles or create your own by modifying the profiles.json file.

Trigger Words

For training specialized models, you can set custom trigger words that will be included in the dataset. These can be used later to activate your trained model.

Troubleshooting

Video Download Issues: Ensure yt-dlp is up to date and the URL is valid
FFmpeg Errors: Make sure FFmpeg is installed on your system and in your PATH
Caption Generation Errors: Check that you have enough VRAM or try a smaller model
Missing Clips: Try enabling debug mode to see detailed processing information

Requirements

This is a comprehensive list of requirements for the node:

# Core dependencies
torch>=2.0.0
pillow>=10.0.0
numpy>=1.24.0
huggingface_hub>=0.26.0

# AutoAWQ with specific version - MUST be installed before transformers
autoawq==0.2.8
flash-attn>=2.0.0;platform_system!="Darwin"  # Optional for performance, excluded on MacOS

# Transformers - MUST be installed after AutoAWQ
transformers>=4.49.0
accelerate>=0.21.0

# Qwen model dependencies
tokenizers>=0.15.0
safetensors>=0.3.1
qwen-vl-utils[decord]>=0.0.8

# Video processing
opencv-python>=4.8.0
decord>=0.6.0
ffmpeg-python>=0.2.0
imageio_ffmpeg>=0.6.0
moviepy>=2.1.2
scenedetect>=0.6.2

# Downloading
yt-dlp>=2023.3.4

# Utilities
tqdm>=4.66.1
requests>=2.31.0
python-slugify>=8.0.1
psutil>=5.9.0
packaging>=23.1
aiohttp>=3.8.5
dotenv-python>=0.0.1

Support

If you find this tool useful, please consider supporting my work by:

Starring the repository on GitHub: ComfyUI-IF_VideoPrompts

Subscribing to my YouTube channel: Impact Frames
Follow me on X: Impact Frames X

Thank You!

License

This project is licensed under the MIT License - see the LICENSE file for details.