ComfyUI Extension: ComfyUI_Wan2_1_lora_trainer

Authored by jaimitoes

Created

Updated

5 stars

ComfyUI interface adaptation of the musubi-tunner library to train Wan 2.1 loras.

Custom Nodes (0)

    README

    ComfyUI_00026_

    <div align="center"><h1>ComfyUI MUSUBI-TUNNER WAN LORA TRAINER</h1></div>

    Important note: The sprites used for the training tests are property of real artirsts (under copyright), so this is not only a copy of the world, it can be a tool for artists too.

    āœ… LORA STRENGTH FIXED!.

    Update version 1.02:

    • Setted up max_train_epochs to 512.
    • avr_loss in last update looking great (128 epochs, 30 images + 5000 steps, network dropout 0.2, other settings as home):
    • image
    • Deconstructed character in pieces:
    • image
    • Result (text to video generation): image

    Also playing with image sequences : image

    Result with 0.5 of strenght:

    image

    Update version 1.01:

    • Fixed as default the learning parameters (A good starting point to see quickly results just loading the nodes/workflow).
    • Added number of cpu per process argument.
    • Added max_train_epochs argument to avoid the internal step limit of 2048.
    • Fixed scale weight norms.
    • Updated example workflow.
    • Updated pictures.
    • Delete your custom folder and clone again.

    Features:

    • ComfyUi lora trainer.
    • Adaptation of the musubi-tunner library by kohya-ss. From https://github.com/kohya-ss/musubi-tuner.
    • Added train into subprocess.
    • Code mods for full compatibility with the last version of ComfyUi.
    • Added 6 nodes for training.
    • Added Example Workflow in the custom node folder.
    • Made for ComfyUi windows portable package.
    • Automated process for the creation of the TOM file, cache latents, cache texts & trainning run (Nodes triggered by this order to do the complete process in one shot).
    • You can skip latents and text caches if you need to restart the training (taking into account data has not changed vae, clipvision, text models).
    • For more info about setting up correctly parameters you can check the Wan Doc on https://github.com/kohya-ss/musubi-tuner/blob/main/docs/wan.md

    About max_train_epochs: The max train epochs is an equations that takes into account several arguments as gradients, number of images etc. This must be set up between 16/512 depending of the number of images you want to train. To ensure a little package of 30 images, set up it as 128 to train more than 5000 steps. Take into account network dropout to not overfitting, also dim and alpha. All is relative but for sure you will find you custom setting depending of your purpose. For the moment max train epochs have a limit to 512, but if is needed to add a bigger max value i can update it.

    Instructions:

    1. Clone this repository into your custom node folder.
    2. install requirements.txt from custom_nodes\ComfyUI_Wan2_1_lora_trainer :
    
    ..\..\..\python -m pip install -r requirements.txt
    
    
    1. run comfyUi
    2. Enjoy training.

    Regular run: If you use regular bat you must to bypass compiler an memory settings, enough for 1.3B models. (attention mode in spda, default parameters already configured for inmediate results) image

    Torch settings run : Run 14B are a heavy process so, highly recomended an instalation of torch >=2.7.0 cuda128 and visual studio tools. After this you must create your custom Bat file adding the visual Studio environment. Example, if you want a bat that use sage attention and also train with musubi compile settings then create it as this:

    @echo off
    REM Load Visual Studio Build Tools for the Wan subprocess environment (Optional to use max pow with the musubi compile settings and memory nodes)
    call "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvarsall.bat" amd64
    REM Start ComfyUI
    .\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --use-sage-attention --fast fp16_accumulation
    pause
    

    NOTE : The reason of adding this windows call is because the Trainer runs in a new sub process inheriting the ComfyUI environment, but needs its own Visual Studio environment to work.

    CLIP VISION : Clip vision is just setted up for I2V models, for training T2V Models, set clip to None.

    Then conect the compiler and memory nodes (choose your desired attention mode): image

    if you don't have any of this modules you can disconnect the musubi compile settings node.

    • Image data input are not exclusive to videos! you can train just with images as the following example (path to your images and text captions): Captura de pantalla 2025-05-23 161441

    • Path into an empty folder for the cache (use different folders for each lora to not mix you cache data (cleaner and probably faster).

    Performances test with 312 images with default settings (spda) : image

    And the results :

    https://github.com/user-attachments/assets/41b439ee-6e90-49ac-83dd-d1ba21fd1d63

    For any concern no doubt to contact me.