ComfyUI Extension: Vid2vid

Authored by sylym

Created

Updated

69 stars

A node suite for ComfyUI that allows you to load image sequence and generate new image sequence with different styles or content.

Custom Nodes (0)

    README

    Vid2vid Node Suite for ComfyUI

    A node suite for ComfyUI that allows you to load image sequence and generate new image sequence with different styles or content.

    Original repo: https://github.com/sylym/stable-diffusion-vid2vid <br>

    Install

    Firstly, install comfyui

    Then run:

    cd ComfyUI/custom_nodes
    git clone https://github.com/sylym/comfy_vid2vid
    cd comfy_vid2vid
    

    Next, download dependencies:

    python -m pip install -r requirements.txt
    

    For ComfyUI portable standalone build:

    #You may need to replace "..\..\..\python_embeded\python.exe" depends your python_embeded location
    ..\..\..\python_embeded\python.exe -m pip install -r requirements.txt
    

    Usage

    All nodes are classified under the vid2vid category. For some workflow examples you can check out:

    vid2vid workflow examples

    Nodes

    LoadImageSequence

    <img alt="Local Image" src="images/nodes/LoadImageSequence.png" width="518" height="264"/>

    Load image sequence from a folder.

    Inputs:

    • None

    Outputs:

    • IMAGE

      • Image sequence
    • MASK_SEQUENCE

      • The alpha channel of the image sequence is the channel we will use as a mask.

    Parameters:

    • image_sequence_folder

      • Select the folder that contains a sequence of images. Node only uses folders in the input folder.
      • The folder should only contain images with the same size.
    • sample_start_idx

      • The start index of the image sequence. The image sequence will be sorted by image names.
    • sample_frame_rate

      • The frame rate of the image sequence. If the frame rate is 2, the node will sample every 2 images.
    • n_sample_frames

      • The number of images in the sequence. The number of images in image_sequence_folder must be greater than or equal to sample_start_idx - 1 + n_sample_frames * sample_frame_rate.
      • If you want to use the node CheckpointLoaderSimpleSequence to generate a sequence of pictures, set n_sample_frames >= 3.

    LoadImageMaskSequence

    <img alt="Local Image" src="images/nodes/LoadImageMaskSequence.png" width="518" height="264"/>

    Load mask sequence from a folder.

    Inputs:

    • None

    Outputs:

    • MASK_SEQUENCE
      • Image mask sequence

    Parameters:

    • image_sequence_folder

      • Select the folder that contains a sequence of images. Node only uses folders in the input folder.
      • The folder should only contain images with the same size.
    • channel

      • The channel of the image sequence that will be used as a mask.
    • sample_start_idx

      • The start index of the image sequence. The image sequence will be sorted by image names.
    • sample_frame_rate

      • The frame rate of the image sequence. If the frame rate is 2, the node will sample every 2 images.
    • n_sample_frames

      • The number of images in the sequence. The number of images in image_sequence_folder must be greater than or equal to sample_start_idx - 1 + n_sample_frames * sample_frame_rate.

    VAEEncodeForInpaintSequence

    <img alt="Local Image" src="images/nodes/VAEEncodeForInpaintSequence.png" width="518" height="264"/>

    Encode the input image sequence into a latent vector using a Variational Autoencoder (VAE) model. Also add image mask sequence to latent vector.

    Inputs:

    • pixels: IMAGE

      • Image sequence that will be encoded.
    • vae: VAE

      • VAE model that will be used to encode the image sequence.
    • mask_sequence: MASK_SEQUENCE

      • Image mask sequence that will be added to the latent vector. The number of images and masks must be the same.

    Outputs:

    • LATENT
      • The latent vector with image mask sequence. The image mask sequence in the latent vector will only take effect when using the node KSamplerSequence.

    Parameters:

    • None

    DdimInversionSequence

    <img alt="Local Image" src="images/nodes/DdimInversionSequence.png" width="518" height="264"/>

    Generate a specific noise vector by inverting the input latent vector using the Ddim model. Usually used to improve the time consistency of the output image sequence.

    Inputs:

    • samples: LATENT

      • The latent vector that will be inverted.
    • model: MODEL

      • Full model that will be used to invert the latent vector.
    • clip: CLIP

      • Clip model that will be used to invert the latent vector.

    Outputs:

    • NOISE
      • The noise vector that will be used to generate the image sequence.

    Parameters:

    • steps
      • The number of steps to invert the latent vector.

    SetLatentNoiseSequence

    <img alt="Local Image" src="images/nodes/SetLatentNoiseSequence.png" width="518" height="264"/>

    Add noise vector to latent vector.

    Inputs:

    • samples: LATENT

      • The latent vector that will be added noise.
    • noise: NOISE

      • The noise vector that will be added to the latent vector.

    Outputs:

    • LATENT
      • The latent vector with noise. The noise vector in the latent vector will only take effect when using the node KSamplerSequence.

    Parameters:

    • None

    CheckpointLoaderSimpleSequence

    <img alt="Local Image" src="images/nodes/CheckpointLoaderSimpleSequence.png" width="518" height="264"/>

    Load the checkpoint model into UNet3DConditionModel. Usually used to generate a sequence of pictures with time continuity.

    Inputs:

    • None

    Outputs:

    • ORIGINAL_MODEL

      • Model for fine-tuning, not for inference
    • CLIP

      • The clip model
    • VAE

      • The VAE model

    Parameters:

    • ckpt_name
      • The name of the checkpoint model. The model should be in the models/checkpoints folder.

    LoraLoaderSequence

    <img alt="Local Image" src="images/nodes/LoraLoaderSequence.png" width="518" height="264"/>

    Same function as LoraLoader node, but acts on UNet3DConditionModel. Used after the CheckpointLoaderSimpleSequence node and before the TrainUnetSequence node. The input and output of the model are both of ORIGINAL_MODEL type.


    TrainUnetSequence

    <img alt="Local Image" src="images/nodes/TrainUnetSequence.png" width="518" height="400"/>

    Fine-tune the incoming model using latent vector and context, and convert the model to inference mode.

    Inputs:

    • samples: LATENT

      • The latent vector that will be used to fine-tune the incoming model.
    • model: ORIGINAL_MODEL

      • The model that will be fine-tuned.
    • context: CONDITIONING

      • The context used for fine-tuning the input model, typically consists of words or sentences describing the subject of the action in the latent vector and its behavior.

    Outputs:

    • MODEL
      • The fine-tuned model. This model is ready for inference.

    Parameters:

    • seed
      • The seed used in model fine-tuning.
    • steps
      • The number of steps to fine-tune the model. If the steps is 0, the model will not be fine-tuned.

    KSamplerSequence

    <img alt="Local Image" src="images/nodes/KSamplerSequence.png" width="518" height="518"/>

    Same function as KSampler node, but added support for noise vector and image mask sequence.


    Limits

    • UNet3DCoditionModel has high demand for GPU memory. If you encounter out of memory error, try to reduce n_sample_frames. However, n_sample_frames must be greater than or equal to 3.

    • Some custom nodes do not support processing image sequences. The nodes listed below have been tested and are working properly: