ComfyUI Extension: ComfyUI-VideoHelperSuite
Nodes related to video workflows
Custom Nodes (458)
- ACEPlusFFTConditioning~
- ACEPlusFFTProcessor~
- ACEPlusFFTLoader~
- ACEPlusLoraConditioning~
- ACEPlusLoraProcessor~
- AddNoise
- Adaptive Projected Guidance
- Audio Adjust Volume
- Audio Concat
- Audio Merge
- BasicGuider
- BasicScheduler
- BetaSamplingScheduler
- ByteDance First-Last-Frame to Video
- ByteDance Image Edit
- ByteDance Image
- ByteDance Reference Images to Video
- ByteDance Image to Video
- ByteDance Seedream 4
- ByteDance Text to Video
- Case Converter
- CFGGuider
- Load Checkpoint With Config (DEPRECATED)
- Load Checkpoint
- Save Checkpoint
- Load CLIP
- CLIPMergeAdd
- CLIPMergeSimple
- CLIPMergeSubtract
- CLIPSave
- CLIP Set Last Layer
- CLIP Text Encode (Prompt)
- CLIPTextEncodeFlux
- CLIPTextEncodeHunyuanDiT
- CLIP Text Encode for Lumina2
- CLIPTextEncodeSD3
- CLIP Vision Encode
- Load CLIP Vision
- Combine Hooks [2]
- Combine Hooks [4]
- Combine Hooks [8]
- ConditioningAverage
- Conditioning (Combine)
- Conditioning (Concat)
- Conditioning (Set Area)
- Conditioning (Set Area with Percentage)
- ConditioningSetAreaPercentageVideo
- ConditioningSetAreaStrength
- Cond Set Default Combine
- Conditioning (Set Mask)
- Cond Set Props
- Cond Set Props Combine
- ConditioningSetTimestepRange
- ConditioningStableAudio
- Timesteps Range
- ConditioningZeroOut
- Context Windows (Manual)
- Apply ControlNet (OLD)
- Apply ControlNet
- Apply Controlnet with VAE
- ControlNetInpaintingAliMamaApply
- Load ControlNet Model
- Create Hook Keyframe
- Create Hook Keyframes From Floats
- Create Hook Keyframes Interp.
- Create Hook LoRA
- Create Hook LoRA (MO)
- Create Hook Model as LoRA
- Create Hook Model as LoRA (MO)
- Create Video
- CropMask
- Load ControlNet Model (diff)
- Differential Diffusion
- DiffusersLoader
- DisableNoise
- DualCFGGuider
- DualCLIPLoader
- EasyCache
- Empty Audio
- EmptyHunyuanImageLatent
- EmptyHunyuanLatentVideo
- EmptyImage
- Empty Latent Audio
- EmptyLatentHunyuan3Dv2
- Empty Latent Image
- EmptySD3LatentImage
- ExponentialScheduler
- ExtendIntermediateSigmas
- FeatherMask
- FlipSigmas
- FluxDisableGuidance
- FluxGuidance
- FluxKontextImageScale
- Flux.1 Kontext [max] Image
- FluxKontextMultiReferenceLatentMethod
- Flux.1 Kontext [pro] Image
- Flux.1 Canny Control Image
- Flux.1 Depth Control Image
- Flux.1 Expand Image
- Flux.1 Fill Image
- Flux 1.1 [pro] Ultra Image
- FreeU
- FreeU_V2
- FreSca
- Google Gemini Image
- Gemini Input Files
- Google Gemini
- Get Image Size
- Get Video Components
- GLIGENLoader
- GLIGENTextBoxApply
- GrowMask
- Hunyuan3Dv2Conditioning
- Hunyuan3Dv2ConditioningMultiView
- HunyuanImageToVideo
- HunyuanRefinerLatent
- HypernetworkLoader
- Ideogram V1
- Ideogram V2
- Ideogram V3
- ImageAddNoise
- Batch Images
- ImageColorToMask
- ImageCompositeMasked
- Image Crop
- ImageFlip
- ImageFromBatch
- Invert Image
- Image Only Checkpoint Loader (img2vid model)
- ImageOnlyCheckpointSave
- Pad Image for Outpainting
- ImageRotate
- Upscale Image
- Upscale Image By
- ImageScaleToMaxDimension
- Image Stitch
- Convert Image to Mask
- Upscale Image (using Model)
- InpaintModelConditioning
- InvertMask
- Join Image with Alpha
- KarrasScheduler
- Kling Image to Video (Camera Control)
- Kling Camera Controls
- Kling Text to Video (Camera Control)
- Kling Dual Character Video Effects
- Kling Image to Video
- Kling Image Generation
- Kling Lip Sync Video with Audio
- Kling Lip Sync Video with Text
- Kling Video Effects
- Kling Start-End Frame to Video
- Kling Text to Video
- Kling Video Extend
- Kling Virtual Try On
- KSampler
- KSampler (Advanced)
- KSamplerSelect
- LaplaceScheduler
- LatentAdd
- LatentApplyOperation
- LatentApplyOperationCFG
- LatentBatch
- LatentBatchSeedBehavior
- Latent Blend
- Latent Composite
- LatentCompositeMasked
- LatentConcat
- Crop Latent
- LatentCut
- Flip Latent
- Latent From Batch
- LatentInterpolate
- LatentMultiply
- LatentOperationSharpen
- LatentOperationTonemapReinhard
- Rotate Latent
- LatentSubtract
- Upscale Latent
- Upscale Latent By
- LazyCache
- Load 3D
- Load 3D - Animation
- Load Audio
- Load Image
- Load Image (as Mask)
- Load Image (from Outputs)
- Load Image Dataset from Folder
- Load Image and Text Dataset from Folder
- LoadLatent
- Load Video
- Load LoRA
- LoraLoaderModelOnly
- Load LoRA Model
- Extract and Save Lora
- Plot Loss Graph
- Luma Concepts
- Luma Image to Image
- Luma Text to Image
- Luma Image to Video
- Luma Reference
- Luma Text to Video
- Mahiro is so cute that she deserves a better guidance function!! (。・ω・。)
- MaskComposite
- MaskPreview
- Convert Mask to Image
- MiniMax Hailuo Video
- MiniMax Image to Video
- MiniMax Text to Video
- ModelComputeDtype
- ModelMergeAdd
- ModelMergeAuraflow
- ModelMergeBlocks
- ModelMergeCosmos14B
- ModelMergeCosmos7B
- ModelMergeCosmosPredict2_14B
- ModelMergeCosmosPredict2_2B
- ModelMergeFlux1
- ModelMergeLTXV
- ModelMergeMochiPreview
- ModelMergeQwenImage
- ModelMergeSD1
- ModelMergeSD2
- ModelMergeSD3_2B
- ModelMergeSD35_Large
- ModelMergeSDXL
- ModelMergeSimple
- ModelMergeSubtract
- ModelMergeWAN2_1
- ModelPatchLoader
- ModelSamplingAuraFlow
- ModelSamplingContinuousEDM
- ModelSamplingContinuousV
- ModelSamplingDiscrete
- ModelSamplingFlux
- ModelSamplingSD3
- ModelSamplingStableCascade
- ModelSave
- Moonvalley Marey Image to Video
- Moonvalley Marey Text to Video
- Moonvalley Marey Video to Video
- ImageMorphology
- OpenAI ChatGPT Advanced Options
- OpenAI ChatGPT
- OpenAI DALL·E 2
- OpenAI DALL·E 3
- OpenAI GPT Image 1
- OpenAI ChatGPT Input Files
- OpenAI Sora - Video
- Cond Pair Combine
- Cond Pair Set Default Combine
- Cond Pair Set Props
- Cond Pair Set Props Combine
- PatchModelAddDownscale (Kohya Deep Shrink)
- Perp-Neg (DEPRECATED by PerpNegGuider)
- Pikadditions (Video Object Insertion)
- Pikaffects (Video Effects)
- Pika Image to Video
- Pika Scenes (Video Image Composition)
- Pika Start and End Frame to Video
- Pika Swaps (Video Object Replacement)
- Pika Text to Video
- PixVerse Image to Video
- PixVerse Template
- PixVerse Text to Video
- PixVerse Transition Video
- PolyexponentialScheduler
- Porter-Duff Image Composite
- Preview 3D
- Preview 3D - Animation
- Preview Any
- Preview Audio
- Preview Image
- Boolean
- Float
- Int
- String
- String (Multiline)
- QwenImageDiffsynthControlnet
- RandomNoise
- Rebatch Images
- Rebatch Latents
- Record Audio
- Recraft Color RGB
- Recraft Controls
- Recraft Creative Upscale Image
- Recraft Crisp Upscale Image
- Recraft Image Inpainting
- Recraft Image to Image
- Recraft Remove Background
- Recraft Replace Background
- Recraft Style - Digital Illustration
- Recraft Style - Infinite Style Library
- Recraft Style - Logo Raster
- Recraft Style - Realistic Image
- Recraft Text to Image
- Recraft Text to Vector
- Recraft Vectorize Image
- Regex Extract
- Regex Match
- Regex Replace
- RepeatImageBatch
- Repeat Latent Batch
- RescaleCFG
- ResizeAndPadImage
- Rodin 3D Generate - Detail Generate
- Rodin 3D Generate - Gen-2 Generate
- Rodin 3D Generate - Regular Generate
- Rodin 3D Generate - Sketch Generate
- Rodin 3D Generate - Smooth Generate
- Runway First-Last-Frame to Video
- Runway Image to Video (Gen3a Turbo)
- Runway Image to Video (Gen4 Turbo)
- Runway Text to Image
- SamplerCustom
- SamplerCustomAdvanced
- SamplerDPMAdaptative
- SamplerDPMPP_2M_SDE
- SamplerDPMPP_2S_Ancestral
- SamplerDPMPP_3M_SDE
- SamplerDPMPP_SDE
- SamplerER_SDE
- SamplerEulerAncestral
- SamplerEulerAncestralCFG++
- SamplerEulerCFG++
- SamplerLMS
- SamplerSASolver
- SamplingPercentToSigma
- SaveAnimatedPNG
- SaveAnimatedWEBP
- Save Audio (FLAC)
- Save Audio (MP3)
- Save Audio (Opus)
- SaveGLB
- Save Image
- SaveLatent
- Save LoRA Weights
- SaveSVGNode
- Save Video
- SDTurboScheduler
- Self-Attention Guidance
- Set CLIP Hooks
- SetFirstSigma
- Set Hook Keyframes
- Set Latent Noise Mask
- SetUnionControlNetType
- SkipLayerGuidanceDiT
- SkipLayerGuidanceDiTSimple
- SkipLayerGuidanceSD3
- SolidMask
- Split Audio Channels
- Split Image with Alpha
- SplitSigmas
- SplitSigmasDenoise
- Stability AI Audio Inpaint
- Stability AI Audio To Audio
- Stability AI Stable Diffusion 3.5 Image
- Stability AI Stable Image Ultra
- Stability AI Text To Audio
- Stability AI Upscale Conservative
- Stability AI Upscale Creative
- Stability AI Upscale Fast
- Compare
- Concatenate
- Contains
- Length
- Replace
- Substring
- Trim
- Apply Style Model
- Load Style Model
- SVD_img2vid_Conditioning
- Tangential Damping CFG
- TextEncodeHunyuanVideo_ImageToVideo
- ThresholdMask
- Train LoRA
- Trim Audio Duration
- TripleCLIPLoader
- Tripo: Convert model
- Tripo: Image to Model
- Tripo: Multiview to Model
- Tripo: Refine Draft model
- Tripo: Retarget rigged model
- Tripo: Rig model
- Tripo: Text to Model
- Tripo: Texture model
- unCLIPCheckpointLoader
- unCLIPConditioning
- Load Diffusion Model
- Load Upscale Model
- USOStyleReference
- VAE Decode
- VAE Decode Audio
- VAEDecodeHunyuan3D
- VAE Decode (Tiled)
- VAE Encode
- VAE Encode Audio
- VAE Encode (for Inpainting)
- VAE Encode (Tiled)
- Load VAE
- VAESave
- Google Veo 3 Video Generation
- Google Veo 2 Video Generation
- Audio to legacy VHS_AUDIO🎥🅥🅗🅢
- Meta Batch Manager 🎥🅥🅗🅢
- Repeat Images 🎥🅥🅗🅢
- Repeat Latents 🎥🅥🅗🅢
- Repeat Masks 🎥🅥🅗🅢
- Get Image Count 🎥🅥🅗🅢
- Get Latent Count 🎥🅥🅗🅢
- Get Mask Count 🎥🅥🅗🅢
- Load Audio (Path)🎥🅥🅗🅢
- Load Audio (Upload)🎥🅥🅗🅢
- Load Image (Path) 🎥🅥🅗🅢
- Load Images (Upload) 🎥🅥🅗🅢
- Load Images (Path) 🎥🅥🅗🅢
- Load Video (Upload) 🎥🅥🅗🅢
- Load Video FFmpeg (Upload) 🎥🅥🅗🅢
- Load Video FFmpeg (Path) 🎥🅥🅗🅢
- Load Video (Path) 🎥🅥🅗🅢
- Merge Images 🎥🅥🅗🅢
- Merge Latents 🎥🅥🅗🅢
- Merge Masks 🎥🅥🅗🅢
- Prune Outputs 🎥🅥🅗🅢
- Select Every Nth Image 🎥🅥🅗🅢
- Select Every Nth Latent 🎥🅥🅗🅢
- Select Every Nth Mask 🎥🅥🅗🅢
- Select Filename 🎥🅥🅗🅢
- Select Images 🎥🅥🅗🅢
- Select Latents 🎥🅥🅗🅢
- Select Latest 🎥🅥🅗🅢
- Select Masks 🎥🅥🅗🅢
- Split Images 🎥🅥🅗🅢
- Split Latents 🎥🅥🅗🅢
- Split Masks 🎥🅥🅗🅢
- Unbatch 🎥🅥🅗🅢
- VAE Decode Batched 🎥🅥🅗🅢
- VAE Encode Batched 🎥🅥🅗🅢
- Legacy VHS_AUDIO to Audio🎥🅥🅗🅢
- Video Combine 🎥🅥🅗🅢
- Video Info 🎥🅥🅗🅢
- Video Info (Loaded) 🎥🅥🅗🅢
- Video Info (Source) 🎥🅥🅗🅢
- VideoLinearCFGGuidance
- VideoTriangleCFGGuidance
- Vidu Image To Video Generation
- Vidu Reference To Video Generation
- Vidu Start End To Video Generation
- Vidu Text To Video Generation
- VoxelToMesh
- VoxelToMeshBasic
- VPScheduler
- WAN Context Windows (Manual)
- Wan Image to Image
- Wan Image to Video
- Wan Text to Image
- Wan Text to Video
- Webcam Capture
README
ComfyUI-VideoHelperSuite
Nodes related to video workflows
I/O Nodes
Load Video
Converts a video file into a series of images
- video: The video file to be loaded
- force_rate: Discards or duplicates frames as needed to hit a target frame rate. Disabled by setting to 0. This can be used to quickly match a suggested frame rate like the 8 fps of AnimateDiff.
- force_size: Allows for quick resizing to a number of suggested sizes. Several options allow you to set only width or height and determine the other from aspect ratio.
- frame_load_cap: The maximum number of frames which will be returned. This could also be thought of as the maximum batch size.
- skip_first_frames: How many frames to skip from the start of the video after adjusting for a forced frame rate. By incrementing this number by the frame_load_cap, you can easily process a longer input video in parts.
- select_every_nth: Allows for skipping a number of frames without considering the base frame rate or risking frame duplication. Often useful when working with animated gifs
A path variant of the Load Video node exists that allows loading videos from external paths
If Advanced Previews is enabled in the options menu of the web ui, the preview will reflect the current settings on the node.
Load Image Sequence
Loads all image files from a subfolder. Options are similar to Load Video.
- image_load_cap: The maximum number of images which will be returned. This could also be thought of as the maximum batch size.
- skip_first_images: How many images to skip. By incrementing this number by image_load_cap, you can easily divide a long sequence of images into multiple batches.
- select_every_nth: Allows for skipping a number of images between every returned frame.
A path variant of Load Image sequence also exists.
Video Combine
Combines a series of images into an output video
If the optional audio input is provided, it will also be combined into the output video
- frame_rate: How many of the input frames are displayed per second. A higher frame rate means that the output video plays faster and has less duration. This should usually be kept to 8 for AnimateDiff, or matched to the force_rate of a Load Video node.
- loop_count: How many additional times the video should repeat
- filename_prefix: The base file name used for output.
- You can save output to a subfolder:
subfolder/video - Like the builtin Save Image node, you can add timestamps.
%date:yyyy-MM-ddThh:mm:ss%might become 2023-10-31T6:45:25
- You can save output to a subfolder:
- format: The file format to use. Advanced information on configuring or adding additional video formats can be found in the Video Formats section.
- pingpong: Causes the input to be played back in the reverse to create a clean loop.
- save_output: Whether the image should be put into the output directory or the temp directory.
Returns: a
VHS_FILENAMESwhich consists of a boolean indicating if save_output is enabled and a list of the full filepaths of all generated outputs in the order created. Accordinglyoutput[1][-1]will be the most complete output.
Depending on the format chosen, additional options may become available, including
- crf: Describes the quality of the output video. A lower number gives a higher quality video and a larger file size, while a higher number gives a lower quality video with a smaller size. Scaling varies by codec, but visually lossless output generally occurs around 20.
- save_metadata: Includes a copy of the workflow in the output video which can be loaded by dragging and dropping the video, just like with images.
- pix_fmt: Changes how the pixel data is stored.
yuv420p10lehas higher color quality, but won't work on all devices
Load Audio
Provides a way to load standalone audio files.
- seek_seconds: An optional start time for the audio file in seconds.
Latent/Image Nodes
A number of utility nodes exist for managing latents. For each, there is an equivalent node which works on images.
Split Batch
Divides the latents into two sets. The first split_index latents go to output A and the remainder to output B. If less then split_index latents are provided as input, all are passed to output A and output B is empty.
Merge Batch
Combines two groups of latents into a single output. The order of the output is the latents in A followed by the latents in B.
If the input groups are not the same size, the node provides options for rescaling the latents before merging.
Select Every Nth
The first of every select_every_nth input is passed and the remainder are discarded
Get Count
Duplicate Batch
Video Previews
Load Video (Upload), Load Video (Path), Load Images (Upload), Load Images (Path) and Video Combine provide animated previews.
Nodes with previews provide additional functionality when right clicked
- Open preview
- Save preview
- Pause preview: Can improve performance with very large videos
- Hide preview: Can improve performance, save space
- Sync preview: Restarts all previews for side-by-side comparisons
Advanced Previews
Advanced Previews must be manually enabled by clicking the settings gear next to Queue Prompt and checking the box for VHS Advanced Previews.
If enabled, videos which are displayed in the ui will be converted with ffmpeg on request. This has several benefits
- Previews for Load Video nodes will reflect the settings on the node such as skip_first_frames and frame_load_cap
- This makes it easy to select an exact portion of an input video and sync it with outputs
- It can use substantially less bandwidth if running the server remotely
- It can greatly improve the browser performance by downsizing videos to the in ui resolution, particularly useful with animated gifs
- It allows for previews of videos that would not normally be playable in browser.
- Can be limited to subdirectories of ComyUI if
VHS_STRICT_PATHSis set as an environment variable.
This fucntionality is disabled since it comes with several downsides
- There is a delay before videos show in the browser. This delay can become quite large if the input video is long
- The preview videos are lower quality (The original can always be viewed with Right Click -> Open preview)
Video Formats
Those familiar with ffmpeg are able to add json files to the video_formats folders to add new output types to Video Combine. Consider the following example for av1-webm
{
"main_pass":
[
"-n", "-c:v", "libsvtav1",
"-pix_fmt", "yuv420p10le",
"-crf", ["crf","INT", {"default": 23, "min": 0, "max": 100, "step": 1}]
],
"audio_pass": ["-c:a", "libopus"],
"extension": "webm",
"environment": {"SVT_LOG": "1"}
}
Most configuration takes place in main_pass, which is a list of arguments that are passed to ffmpeg.
"-n"designates that the command should fail if a file of the same name already exists. This should never happen, but if some bug were to occur, it would ensure other files aren't overwritten."-c:v", "libsvtav1"designates that the video should be encoded with an av1 codec using the new SVT-AV1 encoder. SVT-AV1 is much faster than libaom-av1, but may not exist in older versions of ffmpeg. Alternatively, av1_nvenc could be used for gpu encoding with newer nvidia cards."-pix_fmt", "yuv420p10le"designates the standard pixel format with 10-bit color. It's important that some pixel format be specified to ensure a nonconfigurable input pix_fmt isn't used.
audio pass contains a list of arguments which are passed to ffmpeg when audio is passed into Video Combine
extension designates both the file extension and the container format that is used. If some of the above options are omitted from main_pass it can affect what default options are chosen.
environment can optionally be provided to set environment variables during execution. For av1 it's used to reduce the verbosity of logging so that only major errors are displayed.
input_color_depth effects the format in which pixels are passed to the ffmpeg subprocess. Current valid options are 8bit and 16bit. The later will produce higher quality output, but is experimental.
Fields can be exposed in the webui as a widget using a format similar to what is used in the creation of custom nodes. In the above example, the argument for -crf will be exposed as a format widget in the webui. Format widgets are a list of up to 3 terms
- The name of the widget that will be displayed in the web ui
- Either a primitive such as "INT" or "BOOLEAN", or a list of string options
- A dictionary of options