An image generation based music visualizer integrated into comfyanonymous/ComfyUI as custom nodes.
Nothing fuzzy about it.
An image generation based music visualizer integrated into comfyanonymous/ComfyUI as custom nodes.
Audio Loader
: For loading an audio file using the librosa library.Audio Feature Calculator
: For calculating the audio feature modifiers for each step in the given audio.Prompt Sequence Builder
: For stacking multiple ComfyUI prompts into a Prompt Sequence.Prompt Sequence Interpolator
: For calculating additional sub-prompts within each set of prompts in Prompt Sequence to create a weighted interpolation transition between each.Prompt Sequence Renderer
: For rendering a sequence of images with a variety of parameters from a Prompt Sequence.Image Concatenator
: For combining multiple images into a single Tensor. Images are not visually combined and are still distinct.Prompt Sequence Loader
: For loading a sequence of prompts from a JSON file fitting the specifications found here.AUDIO
(tuple[ndarray, float]
): For representing loaded audio files.TENSOR_1D
(Tensor[*]
): For representing a 1D Tensor of data.PROMPT_SEQ
(list[MbmPrompt]
): For representing multiple prompts (positive and negative) in an ordered sequence..\.venv\Scripts\activate
from ComfyUI's root directory.custom_nodes
directory by entering the directory and running: git clone [email protected]:Sorcerio/MBM-Music-Visualizer.git MBM_MusicVisualizer
.MBM_MusicVisualizer
directory.pip install -r .\requirements.txt
to install this project's dependencies.Nodes will be found in the MBMnodes/
submenu inside ComfyUI.
📝 Note: Drag the example workflow file into ComfyUI to automatically load this flow!
Place any audio files you would like to load in the audio/ directory.
You can always refresh the webpage ComfyUI is loaded into to refresh the list in the Audio Loader
node.
The Music Visualizer
node takes an AUDIO
object in and produces a set of Latent Images on output.
A tqdm
progress bar will be shown in the console to display the current status of the visualization and how long it is expected to take.
Upon completion of a visualization, the Music Visualizer
will output the input FPS, a set of Latent Images that can be decoded using the ComfyUI-native Latent Decoder, and a set of Images showing relevant data from the run.
The FPS can be fed into any further video or GIF generating nodes.
The Latent Images, which are the output content of the visualization, should be converted and either saved individually or compiled into a video through another node.
The charts are pixel images that can be saved or modified as desired.
When testing your generations, consider bypassing the Prompt Sequence Renderer
and its outputs.
(Alternatively, set the image_limit
to 1
or higher to generate only a specific number of images.)
Doing so will still produce complete charts for most data sources allowing you to preview the general flow of the visualization before you commit to image generation for all frames.
Additional documentation can be found in the Documentation directory included in this repository.
As this is a hobby project, no promise is offered for implementation of Roadmap features. However, if you have the time and ability, feel free to submit a Push Request.
Feature
: Add ability to use a hash (?) of a latent (image?) to generate a dummy (random?) audio input.Feature
: Support option for Frame2Frame by using the previous frame as the latent input (img to latent + latent modifier) for the next frame.Feature
: Add camera effects similar to Scene Weaver (possible without Img2Img?).Feature
: Add ability to drag in audio and prompt sequence files to the loader.Test
: Optimize by using previous image for N number of frames if the delta is small enough. ("Delta Cull Threshold")