ComfyUI Extension: CLIP-Token-Injection

Authored by Extraltodeus

Created

Updated

6 stars

These nodes are to edit the text vectors of CLIP models, so to customize how the prompts will be interpreted. You could see it as either customisation, 'one token prompt' up to some limitation and a way to mess with how the text will be interpreted. The edited CLIP can then be saved, or as well the edited tokens themselves. The shared example weights does not contain any image-knowledge but the text vector of the words affected.

Custom Nodes (0)

    README

    Note:

    These nodes are to edit the text vectors of CLIP models, so to customize how the prompts will be interpreted.

    You could see it as either customisation, "one token prompt" up to some limitation and a way to mess with how the text will be interpreted.

    The edited CLIP can then be saved, or as well the edited tokens themselves.

    The shared example weights does not contain any image-knowledge but the text vector of the words affected.

    You will find them in the category "advanced/token surgery".

    Here the words "woman" and "forest" have been affected. As you can see you can shove a lot of meaning in single tokens:

    image_grid

    This example is explained further below.

    CLIP-Token-Injection

    image

    Each token composing any word in the first text input will be modified by those in the second and third text inputs.

    "pos_vs_neg" affects the relative influence in between the second and third text inputs. At 1 will only add the meaning of the second text input. At 0 will only subtract what is the third text input. This value has no use if only one of the two text inputs is used.

    "strength": This is a weighted average. At 1 will fully replace the target tokens. At 0 will have no effect.

    The node can be chained with others and the results can be saved.

    Save / Load custom tokens:

    image

    Allows to save directly the customized weights by simply typing them.

    The node loading them uses a weighted average. 1 meaning full replacement of the target tokens.

    Potential use case

    Weights customization:

    Allows to edit how the text will be interpreted. In cases where you would be prefering something differently.

    Temporarily or permanently.

    You could for example combine into "realistic" the tokens "dark noise shadow cinematic photographic" and subtract from it "cgi render rendering rendered videogame videogames cartoon" to help with general realism.

    Example

    Prompt:

    a woman with long hair in a forest, photography
    

    Here "woman" is being added "pretty perfect love" and subtracted "evil demon ugly" and "forest" is being modified the same way:

    image

    Same modification strength for both words, In this order: -1 / -0.7 / 0 (base image) / 0.7 / 1 / 1 but fully adding the words to add, without subtracting the negative words (so "pos_vs_neg" at 1):

    image_grid

    AI Safety:

    [!CAUTION] I DO NOT PROVIDE ANY GUARANTEE OF RESULT nor that any user won't be able to circumvent any modification you may apply.

    Provided in the "custom weights" folder a file named "AISafety.pt" in which a large set of tokens where main concepts relative to nudity/pornography and children have been modified.

    Any user prompting for unsafe content will find Chris Hansen staring at him, with some FBI-related details.

    The file named "ai_safety_example.png" shows you how to load these weights and demonstrates what you will get if you try to use such prompt.

    I can not share the full list of words used as even pastebin thinks that I'm trying to share CP.

    The child related targeted tokens from the file named "AISafety.pt" are the following:

    girl teen teens teenager boy toddlers children infant infants baby babies kid kiddo yo years old
    1 2 3 4 5 6 7 8 9 0
    one two three four five six seven eight nine ten eleven twelve
    

    Some have been replaced by "Chris" and some by "Hansen". Tokens used by NSFW related terms have mainly been replaced by the token "fbi".

    This is what you will obtain if you attempt to prompt for unsafe content with such modification:

    01383UI_00001_

    [!TIP]

    • Remember, if you are attempting to use many tokens to add and subtract that it will throw it all into single token. Typing the trylogy of the lord of the rings will most likely give you a token without much meaning.
    • in the add or subtract text inputs, a word used twice will be counting double
    • to not fully lose the meaning of a word you can either set a lower strength or copy the target token in the text input of tokens to add.
    • Using words that are a single token seems to work better. CLIP L and CLIP G shares the same vocabulary which you can check in any Huggingface repository containing a CLIP L or G model, often under the name of "vocab.json". Here is one