Data research, preparation, and manipulation nodes for model trainers and artists.
Data research, preparation, and manipulation nodes for model trainers and artists. </br>
[Nov 12th 2024]
Note: Please upgrade as this is a major update that renders all previous updates invalid.
comfy-cli
(https://github.com/yoland68/comfy-cli)comfy node registry-install ComfyUI-DataSet
https://github.com/daxcay/ComfyUI-DataSet.git
pip install -r requirements.txt
to install the dependenciesInside ComfyUI > Click Manager Button on Side.
Click Custom Nodes Manager
and Search for DataSet
and Install this node:
Restart ComfyUI and it should be good to go
The DataSet_Visualizer
node is designed to visualize dataset captions. It generates graphs offering various perspectives on token analysis. The word cloud represents token frequency with different sized fonts. The network graph illustrates the relationships between tokens. The frequency graph provides an exact metric of how often each token appears in your captions.
The DataSet_CopyFiles
node provides a method to copy files from a source folder to a destination folder using different modes: BlindCopy
and CopyByDestinationFiles
.
BlindCopy
: copies all files from source to the destination folder.CopyByDestinationFiles
: copies files from source folder to the destination only if there is a matching file (based on the base name) already present in the destination.The DataSet_TriggerWords
node is designed to extract triggerwords from captions. The node identifies triggerwords as tokens containing BOTH letters and numbers.
STRING
, required): the contents of the text file(s) to be processed.['trigger_word_only', 'trigger_word_phrase']
, required):
'trigger_word_only'
: extracts individual triggerwords only'trigger_word_phrase'
: extracts entire phrase (contained within two comma's) which contains a triggerwordSTRING
, list): the extracted triggerwords or triggerword-containing phraseThe DataSet_TextFilesLoad
node is designed to process the basic attributes of txt files. It can for instance extract filenames or filenames WITHOUT extensions, file-paths and file-contents. Useful for certain batched workflows. It takes a file directory path as input
STRING
, required): a list of file paths to the text files to be loaded. Only paths ending with .txt
will be processed.STRING
, list): the names of the text files.STRING
, list): the names of the text files without their extensions.STRING
, list): the file paths of the text files.STRING
, list): the contents of the text files.Same as above, but uses a widget path to file directory for input
STRING
, required): the directory path where the text files are located. The path should be specified as a string.STRING
, list): the names of the text files in the directory.STRING
, list): the names of the text files without their extensions.STRING
, list): the file paths of the text files in the directory.STRING
, list): the contents of the text files in the directory.The DataSet_TextFilesSave
node is designed to save text file contents to a specified directory. Supports the following modes: overwriting, merging, creating new files, and merging before saving new files DONT UNDERSTAND THIS.
STRING
, required): the names of the text files to be saved.STRING
, required): the contents of the text files to be saved.STRING
, required): the directory path where the text files will be saved.Overwrite
: overwrites existing files with the same name.Merge
: appends content to existing files with the same name.SaveNew
: saves new files with a unique name if a file with the same name already exists.MergeAndSaveNew
: merges content with existing files and then saves as a new file with a unique name if a file with the same name already exists.The DataSet_FindAndReplace
node finds and replaces a text pattern within caption text files.
STRING
, required): the text file contents to be processedSTRING
, default: "search-text", required): the searched text pattern within the TextFileContents
. Supports multiline input.STRING
, default: "replacement-text", required): the replacement text for the SearchFor
pattern. Supports multiline input.STRING
, list): the modified contents of the text filesThe DataSet_PathSelector
is useful for identifying images in a sub-dataset which are missing caption text files from a larger parent repository of image-text pairings. The node will search for orphaned text/image files in one directory which require the missing pair files with matching names from another directory.
STRING
, required): the sub-dataset directory missing pairingsSTRING
, required): the extensions of the orphaned files, separated by commas (e.g., .txt, .csv
).STRING
, required): the repository directory containing the complete text-image pairings.STRING
, required): the extensions of the required files to be added, separated by commas (e.g., .txt, .csv
).STRING
, list): the names of the required files with their extensions.STRING
, list): the names of the required files without their extensions.STRING
, list): the full paths of the required files.The DataSet_ConceptManager
node is designed to add/remove tokens within caption files, and it will allow you to place these tokens at designated positions within the caption
STRING
, required): the contents of the text file(s) to be processed.STRING
, required): the mode of operation: 'add'
to add tokens or 'remove'
to remove tokens.STRING
, required): the concepts to add or remove, formatted as text + position (e.g., "tag1 0, tag2 2"
for adding, "tag1, tag2"
for removing).STRING
, list): the modified contents of the caption file(s)The DataSet_OpenAIChat
uses the OpenAI GPT chat to help you generate prompts.
"GPTo"
, "gpt-3.5-turbo"
, etc."https://api.openai.com/v1"
): the base URL for the API.The DataSet_LoadImage
node provides essential image file attributes for captioning with the DataSet_OpenAIChat
node. It leverages
Pillow and Numpy libraries.
The DataSet_SaveImage
node batch saves images to a specified directory with optional PNG metadata. Also uses Pillow and Numpy.
The DataSet_OpenAIChat
uses the OpenAI GPTo multi-modal vision API in a chat framework, in order to caption images.
"GPTo"
, "gpt-3.5-turbo"
, etc.The DataSet_OpenAIChatImageBatch
class extends the functionality of DataSet_OpenAIChatImage
to process batches of images with OpenAI's chat API for generating text catpions.
"GPTo"
, "gpt-3.5-turbo"
, etc.Buy me a coffee: https://buymeacoffee.com/daxtoncaylor
Support me on paypal: https://paypal.me/daxtoncaylor