ComfyUI Extension: Joy Caption Two - PixelaiLabs Edition
A fully automated ComfyUI custom node that uses Joy Caption Alpha Two to generate high-quality image captions. Perfect for: LoRA/model training dataset preparation, Image tagging and organization, NSFW content captioning, Batch processing large image collections
Custom Nodes (0)
README
Joy Caption Two - PixelaiLabs Edition
<div align="center">Automated Image Captioning for ComfyUI with Joy Caption Alpha Two
Created by PixelaiLabs.com | @aiconomist
Installation • Features • Usage • License
</div>🌟 What is This?
A fully automated ComfyUI custom node that uses Joy Caption Alpha Two to generate high-quality image captions. Perfect for:
- LoRA/model training dataset preparation
- Image tagging and organization
- NSFW content captioning
- Batch processing large image collections
Why This Version?
- ✅ 100% Automated - All models download automatically on first use
- ✅ Joy Caption Quality - Uses the exact same LoRA adapter and process as the original
- ✅ Clean Output - No "Here is a caption:" prefixes or numbered lists
- ✅ Training Ready - Batch node outputs chronologically numbered files (1.png, 1.txt, 2.png, 2.txt, ...)
- ✅ Easy Setup - Just install requirements and run!
📦 Installation
Step 1: Clone the Repository
cd ComfyUI/custom_nodes
git clone https://github.com/Pixelailabs/Joy_Caption_Two_PixelaiLabs.git
Step 2: Install Requirements
cd Joy_Caption_Two_PixelaiLabs
pip install -r requirements.txt
Step 3: Restart ComfyUI
That's it! On first use, the node will automatically download:
- SigLIP vision model (~1.5GB)
- Joy Caption adapters (~2.5GB total):
- Image adapter (86 MB)
- Custom CLIP weights (1.71 GB)
- LoRA adapter (671 MB) - This ensures clean caption output!
- Your chosen LLM (~4-5GB when you select it)
Total first-time download: ~6-8GB (one-time only!)
✨ Features
| Feature | Description | |---------|-------------| | Joy Caption LoRA | Uses the official 671MB LoRA adapter for professional captions | | NSFW Capable | Handles adult content with image embeddings (LLM truly "sees" the image) | | Automated Setup | All models download automatically - no manual file placement | | Multiple Caption Styles | Descriptive, Training Prompt, Booru Tags, Art Critic, etc. | | Adjustable Length | From very short to very long captions | | Text Processing | Replace gender/age, hair, body type, remove tattoos/jewelry | | Batch Processing | Process entire folders with chronological naming for training | | Dual Outputs | Advanced node outputs positive + negative prompts |
🚀 Usage
Basic Workflow
[Load Image] → [Simple LLM Caption Loader] → [Simple LLM Caption] → [Save Text]
↓
Caption Text
-
Add "Simple LLM Caption Loader" node
- Select LLM: "AUTO-DOWNLOAD: Llama-3.1-8B-Lexi-Uncensored-V2-nf4" (recommended)
- Models download automatically on first run
-
Add "Simple LLM Caption" node
- Connect pipeline from loader
- Connect your image
- Choose caption type and length
- Optional: Add LoRA trigger, text replacements
-
Output: Clean caption text ready to use!
Batch Processing for Training
Perfect for preparing LoRA/model training datasets:
[Simple LLM Caption Loader] → [Simple LLM Caption Batch]
↓
Chronologically numbered files
Input folder:
my_photos/
├── photo1.jpg
├── image_abc.png
└── IMG_5678.jpg
Output folder:
training_data/
├── 1.png ← photo1.jpg converted
├── 1.txt ← caption for 1.png
├── 2.png ← image_abc.png converted
├── 2.txt ← caption for 2.png
├── 3.png ← IMG_5678.jpg converted
└── 3.txt ← caption for 3.png
Ready to use with Kohya, aitoolkit, or any training tool!
🎨 Caption Types
| Type | Description | |------|-------------| | Descriptive | Formal, detailed description | | Descriptive (Informal) | Casual, conversational description | | Training Prompt | Stable Diffusion style prompt | | MidJourney | MidJourney style prompt | | Booru Tags | Tag list (Danbooru/Gelbooru format) | | Art Critic | Detailed analysis of composition and style | | Social Media | Engaging caption for social posts |
⚙️ Nodes
1. Simple LLM Caption Loader
Loads the LLM and Joy Caption adapter models.
Parameters:
llm_model
- Choose from dropdown (AUTO-DOWNLOAD options shown first)- Recommended: "AUTO-DOWNLOAD: Llama-3.1-8B-Lexi-Uncensored-V2-nf4"
- Only shows Llama-based models (Joy Caption compatible)
- Automatically filters out incompatible models (Florence, CLIP, etc.)
use_4bit
- Enable 4-bit quantization (recommended for 8GB VRAM)
Note: Vision model (SigLIP) downloads automatically - no selection needed!
2. Simple LLM Caption
Generate captions for single images.
Parameters:
caption_type
- Style of captioncaption_length
- Target length (any, short, medium, long, very long)lora_trigger
- Text to prepend to caption- Text replacements (gender/age, hair, body size)
- Filters (remove tattoos, remove jewelry)
Output: STRING
(caption text)
3. Simple LLM Caption Advanced
Advanced options with positive/negative prompt outputs.
Additional Parameters:
temperature
/top_p
- Control randomnessmax_new_tokens
- Maximum caption lengthappend_to_caption
- Text to add at END of captionnegative_prompt
- Manual negative prompt input
Outputs:
positive_prompt
(STRING) - Final captionnegative_prompt
(STRING) - Your negative prompt
4. Simple LLM Caption Batch
Batch process folders with chronological naming for training.
Parameters:
input_directory
- Folder with imagesoutput_directory
- Where to save (REQUIRED)- All caption options from basic node
Output: Status message
🔧 System Requirements
| Requirement | Minimum | Recommended | |-------------|---------|-------------| | VRAM | 8GB (with 4-bit) | 12GB+ | | RAM | 16GB | 32GB+ | | Storage | 10GB free | 20GB+ free | | Python | 3.10+ | 3.11 |
📖 Credits & Attribution
This custom node is built using:
Core Technology
- Joy Caption by fpgaminer - Original Joy Caption implementation
- Joy Caption Alpha Two by fancyfeast - LoRA adapter and training
Inspiration
- ComfyUI_SLK_joy_caption_two by EvilBT - ComfyUI implementation reference
Models Used
- SigLIP by Google - Vision backbone
- Llama 3.1 by Meta - Language model
📄 License
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
Important Notes:
- This is a derivative work based on Joy Caption Alpha Two (GPL-3.0)
- Commercial use is allowed under GPL-3.0 terms
- Any modifications must also be released under GPL-3.0
- Please provide attribution when using or modifying this code
🤝 Support
- Issues: GitHub Issues
- YouTube: @aiconomist
- Website: PixelaiLabs.com
🎯 Roadmap
- [ ] Support for more LLM models
- [ ] Custom LoRA adapter training
- [ ] Multi-language captions
- [ ] Video frame captioning
<div align="center">
Made with ❤️ by PixelaiLabs.com
If you find this useful, please ⭐ star the repository!
</div>