Create Video datasets straight from YT or a video file path
An advanced ComfyUI node that converts video files or YouTube links into well-structured training datasets for AI image generation models. This tool is based on the work done by zsxkib in their repository cog-create-video-dataset.
The IF_VideoDatasetMaker node allows you to:
Perfect for creating training datasets for HyperNetworks, LoRAs, Dreambooth or other fine-tuning approaches.
cd ./custom_nodes
git clone https://github.com/if-ai/ComfyUI-IF_DatasetMkr.git
cd ./ComfyUI-IF_DatasetMkr.
pip install -r requirements.txt
If you want to use AWQ to save VRAM and up to 3x faster inference you need to install triton and autoawq
pip install triton
pip install --no-deps --no-build-isolation autoawq
This node requires several packages for video processing, AI captioning, and file management:
| Parameter | Description | |-----------|-------------| | video_url | YouTube/video URL to process | | video_file | Local video file path to process | | trigger_word | Custom trigger word (optional) | | autocaption | Enable/disable AI captioning | | custom_caption | Static caption to use for all frames | | autocaption_prefix | Text to add before all generated captions | | autocaption_suffix | Text to add after all generated captions | | output_dir | Custom output directory (defaults to ComfyUI output folder) | | model_variant | Qwen-VL model to use for captioning | | model_offload | Toggle CPU offloading to save VRAM | | hf_token | Hugging Face token for downloading models | | profile | Captioning profile/persona to use | | image_size | Resolution for processing frames | | debug_mode | Enable additional debugging information |
The node generates a structured dataset with:
videos
folder containing all extracted clipscaptions
folder with text files matching each clip's name{trigger_word}_{clip_number}.txt
The node comes with built-in caption profiles that control the style and content of generated descriptions. You can edit these profiles or create your own by modifying the profiles.json
file.
For training specialized models, you can set custom trigger words that will be included in the dataset. These can be used later to activate your trained model.
This is a comprehensive list of requirements for the node:
# Core dependencies
torch>=2.0.0
pillow>=10.0.0
numpy>=1.24.0
huggingface_hub>=0.26.0
# AutoAWQ with specific version - MUST be installed before transformers
autoawq==0.2.8
flash-attn>=2.0.0;platform_system!="Darwin" # Optional for performance, excluded on MacOS
# Transformers - MUST be installed after AutoAWQ
transformers>=4.49.0
accelerate>=0.21.0
# Qwen model dependencies
tokenizers>=0.15.0
safetensors>=0.3.1
qwen-vl-utils[decord]>=0.0.8
# Video processing
opencv-python>=4.8.0
decord>=0.6.0
ffmpeg-python>=0.2.0
imageio_ffmpeg>=0.6.0
moviepy>=2.1.2
scenedetect>=0.6.2
# Downloading
yt-dlp>=2023.3.4
# Utilities
tqdm>=4.66.1
requests>=2.31.0
python-slugify>=8.0.1
psutil>=5.9.0
packaging>=23.1
aiohttp>=3.8.5
dotenv-python>=0.0.1
If you find this tool useful, please consider supporting my work by:
Thank You!
This project is licensed under the MIT License - see the LICENSE file for details.