# SD-Latent-Interposer A small neural network to provide interoperability between the latents generated by the different Stable Diffusion models. I wanted to see if it was possible to pass latents generated by the new SDXL model directly into SDv1.5 models without decoding and re-encoding them using a VAE first. ## Installation To install it, simply clone this repo to your custom_nodes folder using the following command: ``` git clone https://github.com/city96/SD-Latent-Interposer custom_nodes/SD-Latent-Interposer ``` Alternatively, you can download the [comfy_latent_interposer.py](https://github.com/city96/SD-Latent-Interposer/raw/main/comfy_latent_interposer.py) file to your `ComfyUI/custom_nodes` folder as well. You may need to install hfhub using the command `pip install huggingface-hub` inside your venv. If you need the model weights for something else, they are [hosted on HF](https://huggingface.co/city96/SD-Latent-Interposer/tree/main) under the same Apache2 license as the rest of the repo. The current files are in the **"v4.0"** subfolder. ## Usage Simply place it where you would normally place a VAE decode followed by a VAE encode. Set the denoise as appropirate to hide any artifacts while keeping the composition. See image below. ![LATENT_INTERPOSER_V3 1_TEST](https://github.com/city96/SD-Latent-Interposer/assets/125218114/849574b4-2565-4090-85d3-ae63ab425ee2) Without the interposer, the two latent spaces are incompatible: ![LATENT_INTERPOSER_V3 1](https://github.com/city96/SD-Latent-Interposer/assets/125218114/13e2c01f-580e-4ecb-af1f-b6b21699127b) ### Local models The node pulls the required files from huggingface hub by default. You can create a `models` folder and place the models there if you have a flaky connection or prefer to use it completely offline. The custom node will prefer local files over HF when available. The path should be: `ComfyUI/custom_nodes/SD-Latent-Interposer/models` Alternatively, just clone the entire HF repo to it: ``` git clone https://huggingface.co/city96/SD-Latent-Interposer custom_nodes/SD-Latent-Interposer/models ``` ### Supported Models Model names: | code | name | | ---- | -------------------------- | | `v1` | Stable Diffusion v1.x | | `xl` | SDXL | | `v3` | Stable Diffusion 3 | | `fx` | Flux.1 | | `ca` | Stable Cascade (Stage A/B) | Available models: | From | to `v1` | to `xl` | to `v3` | to `fx` | to `ca` | |:----:|:-------:|:-------:|:-------:|:-------:|:-------:| | `v1` | - | v4.0 | v4.0 | No | No | | `xl` | v4.0 | - | v4.0 | No | No | | `v3` | v4.0 | v4.0 | - | No | No | | `fx` | v4.0 | v4.0 | v4.0 | - | No | | `ca` | v4.0 | v4.0 | v4.0 | No | - | ## Training The training code initializes most training parameters from the provided config file. The dataset should be a single .bin file saved with `torch.save` for each latent version. The format should be [batch, channels, height, width] with the "batch" being as large as the dataset, ie 88000. ### Interposer v4.0 The training code currently initializes two copies of the model, one in the target direction and one in the opposite. The losses are defined based on this. - `p_loss` is the main criterion for the primary model. - `b_loss` is the main criterion for the secondary one. - `r_loss` is the output of the primary model back through the secondary model and checked against the source latent (basically a round trip through the two models). - `h_loss` is the same as `r_loss` but for the secondary model. All models were trained for 50000 steps with either batch size 128 (xl/v1) or 48 (cascade). The training was done locally on an RTX 3080 and a Tesla V100S. ![LATENT_INTERPOSER_V4_LOSS](https://github.com/city96/SD-Latent-Interposer/assets/125218114/3a0d8920-ed48-42f0-96c9-897263525efb) ### Older versions

Interposer v3.1

### Interposer v3.1 This is basically a complete rewrite. Replaced the mediocre bunch of conv2d layers with something that looks more like a proper neural network. No VGG loss because I still don't have a better GPU. Training was done on combined Flickr2K + DIV2K, with each image being processed into 6 1024x1024 segments. Padded with some of my random images for a total of 22,000 source images in the dataset. I think I got rid of most of the XL artifacts, but the color/hue/saturation shift issues are still there. I actually saved the optimizer state this time so I might be able to do 100K steps with visual loss on my P40s. Hopefully they won't burn up. v3.0 was 500k steps at a constant LR of 1e-4, v3.1 was 1M steps using a CosineAnnealingLR to drop the learning rate towards the end. Both used AdamW. ![INTERPOSER_V3 1](https://github.com/city96/SD-Latent-Interposer/assets/125218114/daff0ae2-4739-4cef-ba54-ac1d156d3388)

Interposer v1.1

### Interposer v1.1 This is the second release using the "spaceship" architecture. It was trained on the Flickr2K dataset and was continued from the v1.0 checkpoint. Overall, it seems to perform a lot better, especially for real life photos. I also investigated the odd v1->xl artifacts but in the end it seems [inherent to the VAE decoder stage.](https://github.com/comfyanonymous/ComfyUI/issues/1116) ![loss](https://github.com/city96/SD-Latent-Interposer/assets/125218114/e890420f-cebd-4f88-b243-62560b8384e5)

Interposer v1.0

### Interposer v1.0 Not sure why the training loss is so different, it might be due to the """highly curated""" dataset of 1000 random images from my Downloads folder that I used to train it. I probably should've just grabbed LAION. I also trained a v1-to-v2 mode, before realizing v1 and v2 shared the same latent space. Oh well. ![loss](https://github.com/city96/SD-Latent-Interposer/assets/125218114/f92c399b-a823-4521-b09b-8bdc3795f1ea) ![xl-to-v1_interposer](https://github.com/city96/SD-Latent-Interposer/assets/125218114/0d963bc5-570f-4ebe-95db-16e261f05e48)