ComfyUI Extension: Joy Caption Two - PixelaiLabs Edition

Authored by Pixelailabs

Created

Updated

1 stars

A fully automated ComfyUI custom node that uses Joy Caption Alpha Two to generate high-quality image captions. Perfect for: LoRA/model training dataset preparation, Image tagging and organization, NSFW content captioning, Batch processing large image collections

Custom Nodes (0)

    README

    Joy Caption Two - PixelaiLabs Edition

    <div align="center">

    Automated Image Captioning for ComfyUI with Joy Caption Alpha Two

    License: GPL v3 ComfyUI

    Created by PixelaiLabs.com | @aiconomist

    InstallationFeaturesUsageLicense

    </div>

    🌟 What is This?

    A fully automated ComfyUI custom node that uses Joy Caption Alpha Two to generate high-quality image captions. Perfect for:

    • LoRA/model training dataset preparation
    • Image tagging and organization
    • NSFW content captioning
    • Batch processing large image collections

    Why This Version?

    • 100% Automated - All models download automatically on first use
    • Joy Caption Quality - Uses the exact same LoRA adapter and process as the original
    • Clean Output - No "Here is a caption:" prefixes or numbered lists
    • Training Ready - Batch node outputs chronologically numbered files (1.png, 1.txt, 2.png, 2.txt, ...)
    • Easy Setup - Just install requirements and run!

    📦 Installation

    Step 1: Clone the Repository

    cd ComfyUI/custom_nodes
    git clone https://github.com/Pixelailabs/Joy_Caption_Two_PixelaiLabs.git
    

    Step 2: Install Requirements

    cd Joy_Caption_Two_PixelaiLabs
    pip install -r requirements.txt
    

    Step 3: Restart ComfyUI

    That's it! On first use, the node will automatically download:

    • SigLIP vision model (~1.5GB)
    • Joy Caption adapters (~2.5GB total):
      • Image adapter (86 MB)
      • Custom CLIP weights (1.71 GB)
      • LoRA adapter (671 MB) - This ensures clean caption output!
    • Your chosen LLM (~4-5GB when you select it)

    Total first-time download: ~6-8GB (one-time only!)


    ✨ Features

    | Feature | Description | |---------|-------------| | Joy Caption LoRA | Uses the official 671MB LoRA adapter for professional captions | | NSFW Capable | Handles adult content with image embeddings (LLM truly "sees" the image) | | Automated Setup | All models download automatically - no manual file placement | | Multiple Caption Styles | Descriptive, Training Prompt, Booru Tags, Art Critic, etc. | | Adjustable Length | From very short to very long captions | | Text Processing | Replace gender/age, hair, body type, remove tattoos/jewelry | | Batch Processing | Process entire folders with chronological naming for training | | Dual Outputs | Advanced node outputs positive + negative prompts |


    🚀 Usage

    Basic Workflow

    [Load Image] → [Simple LLM Caption Loader] → [Simple LLM Caption] → [Save Text]
                                                          ↓
                                                      Caption Text
    
    1. Add "Simple LLM Caption Loader" node

      • Select LLM: "AUTO-DOWNLOAD: Llama-3.1-8B-Lexi-Uncensored-V2-nf4" (recommended)
      • Models download automatically on first run
    2. Add "Simple LLM Caption" node

      • Connect pipeline from loader
      • Connect your image
      • Choose caption type and length
      • Optional: Add LoRA trigger, text replacements
    3. Output: Clean caption text ready to use!

    Batch Processing for Training

    Perfect for preparing LoRA/model training datasets:

    [Simple LLM Caption Loader] → [Simple LLM Caption Batch]
                                           ↓
                                  Chronologically numbered files
    

    Input folder:

    my_photos/
      ├── photo1.jpg
      ├── image_abc.png
      └── IMG_5678.jpg
    

    Output folder:

    training_data/
      ├── 1.png  ← photo1.jpg converted
      ├── 1.txt  ← caption for 1.png
      ├── 2.png  ← image_abc.png converted
      ├── 2.txt  ← caption for 2.png
      ├── 3.png  ← IMG_5678.jpg converted
      └── 3.txt  ← caption for 3.png
    

    Ready to use with Kohya, aitoolkit, or any training tool!


    🎨 Caption Types

    | Type | Description | |------|-------------| | Descriptive | Formal, detailed description | | Descriptive (Informal) | Casual, conversational description | | Training Prompt | Stable Diffusion style prompt | | MidJourney | MidJourney style prompt | | Booru Tags | Tag list (Danbooru/Gelbooru format) | | Art Critic | Detailed analysis of composition and style | | Social Media | Engaging caption for social posts |


    ⚙️ Nodes

    1. Simple LLM Caption Loader

    Loads the LLM and Joy Caption adapter models.

    Parameters:

    • llm_model - Choose from dropdown (AUTO-DOWNLOAD options shown first)
      • Recommended: "AUTO-DOWNLOAD: Llama-3.1-8B-Lexi-Uncensored-V2-nf4"
      • Only shows Llama-based models (Joy Caption compatible)
      • Automatically filters out incompatible models (Florence, CLIP, etc.)
    • use_4bit - Enable 4-bit quantization (recommended for 8GB VRAM)

    Note: Vision model (SigLIP) downloads automatically - no selection needed!

    2. Simple LLM Caption

    Generate captions for single images.

    Parameters:

    • caption_type - Style of caption
    • caption_length - Target length (any, short, medium, long, very long)
    • lora_trigger - Text to prepend to caption
    • Text replacements (gender/age, hair, body size)
    • Filters (remove tattoos, remove jewelry)

    Output: STRING (caption text)

    3. Simple LLM Caption Advanced

    Advanced options with positive/negative prompt outputs.

    Additional Parameters:

    • temperature / top_p - Control randomness
    • max_new_tokens - Maximum caption length
    • append_to_caption - Text to add at END of caption
    • negative_prompt - Manual negative prompt input

    Outputs:

    • positive_prompt (STRING) - Final caption
    • negative_prompt (STRING) - Your negative prompt

    4. Simple LLM Caption Batch

    Batch process folders with chronological naming for training.

    Parameters:

    • input_directory - Folder with images
    • output_directory - Where to save (REQUIRED)
    • All caption options from basic node

    Output: Status message


    🔧 System Requirements

    | Requirement | Minimum | Recommended | |-------------|---------|-------------| | VRAM | 8GB (with 4-bit) | 12GB+ | | RAM | 16GB | 32GB+ | | Storage | 10GB free | 20GB+ free | | Python | 3.10+ | 3.11 |


    📖 Credits & Attribution

    This custom node is built using:

    Core Technology

    Inspiration

    Models Used

    • SigLIP by Google - Vision backbone
    • Llama 3.1 by Meta - Language model

    📄 License

    This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

    Important Notes:

    • This is a derivative work based on Joy Caption Alpha Two (GPL-3.0)
    • Commercial use is allowed under GPL-3.0 terms
    • Any modifications must also be released under GPL-3.0
    • Please provide attribution when using or modifying this code

    🤝 Support


    🎯 Roadmap

    • [ ] Support for more LLM models
    • [ ] Custom LoRA adapter training
    • [ ] Multi-language captions
    • [ ] Video frame captioning

    <div align="center">

    Made with ❤️ by PixelaiLabs.com

    If you find this useful, please ⭐ star the repository!

    </div>