ComfyUI Extension: ComfyUI-SpatialGen

Authored by manycore-maas

Created

Updated

4 stars

Scene Viewer of SpatialGen

Custom Nodes (0)

    README

    ComfyUI-SpatialGen

    <div align="center"> <video src="https://github.com/user-attachments/assets/c541377a-5eef-4c9e-90c5-737b66562589" controls width="400"></video> <p><i>SpatialGen generates multiviews with pre-set trajectory then achieves a fine-grained video with Anysplat and Wan2.2.</i></p> </div>

    Overview

    This repository provides the ComfyUI plugin for Spatialgen, a multi-model diffusion model can generate 3D indoor scenes conditioned on either a reference image or a textual description using a multi-view. ComfyUI-SpatialGen has the capbility to generate multi-view images along with a trajactory pose. image Additionally, you can leverage additional tools to optimize the final outcome. In the provided workflow, AnySplat is used to perform Gaussian reconstruction on the SpatialGen results, and finally the WAN video-generation model is employed to refine and enhance the final output. image

    Examples

    scene_00006

    <details> <summary>Click to view all results</summary>

    | Stage | Results | |--------|----------| | Scene Viewer & Camera Setup | image | | FLUX Text-to-Image Conditioned on Wireframe<br> <details><summary>prompt</summary> A spacious, elegant living room in traditional Chinese style, featuring a low-slung wooden sofa with soft beige cushions and embroidered silk throws facing a large wall-mounted screen displaying the time "10:39" in calligraphy-style font, flanked by dark wood shelves holding porcelain vases and incense burners; a round coffee table made of polished rosewood with a marble top holds tea sets, bamboo scrolls, and a potted orchid; beneath it lies a handwoven black-and-white rug with geometric patterns inspired by ancient Chinese textiles; two hanging lanterns made of red lacquer and translucent paper provide warm ambient lighting, complemented by recessed LED lights along the ceiling; on one wall hangs a framed ink painting of mountains and clouds; near a large window with wooden lattice frames and flowing linen curtains, a classic Chinese armchair in deep red fabric sits beside a side table with a bronze vase of plum blossoms; a glass door opens to a balcony with stone railings and potted bamboo, bathed in soft natural light, creating a serene, harmonious atmosphere blending nature, tradition, and refined craftsmanship.</details> | image | | SpatialGen Multi-view Generation | image | | AnySplat Multi-view Gaussian Reconstruction | <video src="https://github.com/user-attachments/assets/ac4a9d23-748c-44e1-beaf-364b64d3b480" controls width="400"></video> | | WAN2.2 Video2Video | <video src="https://github.com/user-attachments/assets/391bd2d7-8c89-4ab9-8140-43c21f3f6d1c" controls width="400"></video> |

    </details>

    scene_00030

    <details> <summary>Click to view all results</summary>

    | Stage | Results | |--------|----------| | Scene Viewer & Camera Setup | image | | FLUX Text-to-Image Conditioned on Wireframe <br> <details><summary>prompt</summary> A cozy, rustic countryside bedroom featuring a large wooden bed with white linen bedding, a hand-knitted brown wool throw blanket, and patterned cotton pillows centered in the room, positioned in front of a wall-mounted TV displaying a scenic mountain landscape, flanked by a low-profile wooden media console with natural wood grain finish, decorative ceramic vases, vintage books, and a handmade wooden sculpture; to the right of the bed is a built-in wardrobe with open shelves showcasing neatly folded flannel shirts, wool sweaters, and leather boots, adjacent to a solid oak sliding door cabinet; above the bed hangs a wrought iron chandelier with warm Edison bulbs; the floor is covered with wide-plank reclaimed wood and a soft woven jute rug beneath the bed; the walls are finished in soft beige plaster with exposed wooden beams, and the ceiling features recessed lighting and rustic crown molding; natural light filters through a window with burlap curtains, creating a warm, inviting atmosphere of simplicity, nature, and timeless charm.</details> | image | | SpatialGen Multi-view Generation | image | | AnySplat Multi-view Gaussian Reconstruction | <video src="https://github.com/user-attachments/assets/b256c80c-7fe2-439d-b4b3-8f2e39ca83cb" controls width="400"></video> | | WAN2.2 Video2Video | <video src="https://github.com/user-attachments/assets/237cb9d4-3bc6-4b36-b79c-f4b9c0e8fffb" controls width="400"></video> |

    </details>

    scene_00045

    <details> <summary>Click to view all results</summary>

    | Stage | Results | |-------|----------| | Scene Viewer & Camera Setup | image | | FLUX Text-to-Image Conditioned on Wireframe <br> <details><summary>prompt</summary> A tranquil, Japanese-style bedroom inspired by wabi-sabi aesthetics, featuring a low wooden bed with crisp white cotton bedding and a deep navy blue hemp throw blanket, centered beneath a large handwoven rattan pendant light; to the left is a simple wooden nightstand with a bamboo shelf, holding a ceramic tea cup, a small stone lantern, and dried grasses; beside it is a white sliding door closet with a round wooden knob; above the bed hangs a traditional macramé wall hanging made of natural fibers; to the right is a small round wooden side table with a bonsai plant; the floor is light oak wood with a soft tatami-like mat beneath the bed; the ceiling has a small hanging bamboo lamp; translucent gray curtains frame a narrow window, filtering soft morning light into the space, creating a peaceful, meditative atmosphere with natural materials, muted colors, and intentional minimalism. </details> | image | | SpatialGen Multi-view Generation | image | | AnySplat Multi-view Gaussian Reconstruction | <video src="https://github.com/user-attachments/assets/3ff4d6cb-d3bc-4aa6-892d-8589e0d4dfa6" controls width="400"></video> | | WAN2.2 Video2Video | <video src="https://github.com/user-attachments/assets/aa75b6bf-c36d-4833-b0f1-602e9503a743" controls width="400"></video> |

    </details>

    💡 Tips

    • The prompt of the flux inference is very important and it should be consistent with the indoor room layout.

    Workflow description

    1. Use Spatial Scene UI to interactively set up the initial camera and each camera trajectory.
    2. Spatialgen P3D Render will process the related data including the wireframe image of the first camera for Flux inference and all camera poses for SpatialGen inference.
    3. Flux inference workflow will use wireframe image as a condition to generate the first indoor scene.
    4. Using the first indoor image, SpatialGen will generate eight multi-view images conditioned on the camera poses as the original eight anchors.
    5. Conditioned on each anchor, SpatialGen will generate eight multi-view images that align with the corresponding camera trajectory.
    6. Anysplat performs Gaussian Reconstruction on the four selected trajectories to get a coarse video.
    7. Finally, using Wan2.2 to enhance the coarse video into a fine-grained and appealing one.

    Model Installation

    For detailed instuctions, please refer to INSTALL.md.

    SpatialGen Scene View User Guide

    1. upload an indoor scene layout file.
    2. position the eight cameras appropriately.
    3. select one of the eight cameras as the main camera and click Main Camera.
    4. select one trajectory method and click Apply All.

    💡 Tips

    • The eight multi-view images generated by SpatialGen start from the main camera position and proceed clockwise.
    • The main camera position should ideally be positioned to capture as much of the entire interior scene as possible, providing the most useful information.

    Acknowledgements

    We would like to thank the following projects that made this work possible:
    FLUX.1 dev | Nunchaku | Anysplat | VGGT | Wan2.2 | OmniSR | Stand-In