ComfyUI Extension: PDF Tools - Advanced PDF Processing & OCR
Advanced PDF processing, OCR, Image and text parsing, smart image crop nodes for ComfyUI. Features include multi-language OCR (Surya, PaddleOCR VL), AI vision analysis (Florence-2, LayoutLMv3), advanced PDF extraction with quality assessment, spread detection for scanned books, and comprehensive layout analysis. Supports 90+ languages with multiple output formats.
Custom Nodes (0)
README
PDF Tools - ComfyUI Custom Node Package
Advanced PDF processing, OCR, and AI vision analysis nodes for ComfyUI.
š¢ Important Notice: Package Split
The download functionality has moved to a separate package:
- PDF Tools (this package): PDF extraction, OCR, AI vision processing
- Download Tools (new package): gallery-dl and yt-dlp downloaders
If you need media download nodes, install the download-tools package separately:
cd ComfyUI/custom_nodes/download-tools
.\install.ps1
š Quick Start
Installation
cd ComfyUI/custom_nodes/PDF_tools
.\install.ps1
Verify Installation
.\check_install.ps1
Start Using
- Restart ComfyUI
- Look for nodes under categories: PDF, OCR, Vision, Layout
- Start processing documents!
š¦ Available Nodes
PDF Extraction
- PDF Extractor v08/v09 - Advanced image extraction with quality assessment
- Automatic spread detection for scanned books
- Image quality scoring (sharpness, contrast, brightness)
- Duplicate detection
- Organize output by quality
- JSON metadata export
- Simple PDF Extractor - Basic extraction without advanced features
OCR (Optical Character Recognition)
- Surya OCR Layout Node - State-of-the-art multilingual OCR
- 90+ languages supported
- Layout-aware text extraction
- High accuracy on complex documents
- GPU-accelerated inference
- Surya Layout OCR Hybrid - Combined layout analysis + OCR
- Single-step document processing
- Preserves reading order
- Handles multi-column layouts
- PaddleOCR VL Remote - Specialized for Chinese/CJK documents
- Excellent for Asian language texts
- Remote processing capabilities
- Requires separate virtual environment (see PaddleOCR_VL_SETUP.md)
- Runs as standalone service due to CUDA version conflicts
Layout Analysis
- Enhanced Layout Parser v06 - Advanced document understanding
- Detects titles, paragraphs, tables, figures, lists
- Hierarchical structure extraction
- Reading order detection
- Bounding box coordinates
- LayoutLMv3 Node - Microsoft's document AI model
- Multi-modal document understanding
- Form and receipt processing
- Table structure recognition
AI Vision & Object Detection
- Florence2 Rectangle Detector - Microsoft Florence-2 vision model
- Object detection with bounding boxes
- Image captioning (simple & detailed)
- Visual question answering
- OCR and text detection
- Region-specific descriptions
- Florence2 Cropper Node - Crop based on detections
- Automatic image region extraction
- Batch processing of detected objects
š Key Features
ā
Smart PDF Extraction - Quality scoring, spread detection, duplicate removal
ā
Multilingual OCR - 90+ languages with Surya, Chinese/Japanese with PaddleOCR
ā
Layout Understanding - Detect document structure (titles, paragraphs, tables)
ā
AI Vision Models - Florence-2 for object detection and image analysis
ā
Batch Processing - Process multiple documents efficiently
ā
GPU Acceleration - Fast inference with CUDA support
ā
Quality Assessment - Automatic image quality evaluation
ā
JSON Export - Structured metadata for all extractions
š” Usage Examples
Extract High-Quality Images from PDF
Node: PDF Extractor v08
āāā Input PDF: "mybook.pdf"
āāā Output Folder: "./extracted_images"
āāā Options:
ā āāā ā quality_assessment (score each image)
ā āāā ā spread_detection (detect 2-page spreads)
ā āāā ā organize_by_quality (high/medium/low folders)
ā āāā ā save_json_output (metadata file)
āāā Result: Images sorted by quality with detailed metrics
OCR a Scanned Document
Node: Surya OCR Layout Node
āāā Input: "scanned_page.png"
āāā Languages: ["en"] or ["en", "es", "fr"]
āāā Output:
ā āāā Extracted text with 95%+ accuracy
ā āāā Bounding boxes for each word/line
ā āāā Layout information (columns, paragraphs)
Detect Objects in Images
Node: Florence2 Rectangle Detector
āāā Input Image: "photo.jpg"
āāā Task: <OD> (Object Detection)
āāā Output:
ā āāā Bounding boxes for detected objects
ā āāā Labels (e.g., "person", "car", "dog")
ā āāā Confidence scores
Analyze Document Layout
Node: Enhanced Layout Parser v06
āāā Input: PDF page or image
āāā Output:
ā āāā Regions: title, text, table, figure, list
ā āāā Bounding box coordinates
ā āāā Hierarchical structure
ā āāā Reading order
š§ System Requirements
- OS: Windows 10/11 (primary), Linux compatible
- Python: 3.10+ (included with ComfyUI)
- GPU: NVIDIA with CUDA recommended (CPU works but slower)
- RAM: 8GB minimum, 16GB+ recommended for AI models
- Storage: 5-10GB for packages + models
š Documentation
Main Guides
- INSTALLATION_GUIDE.md - Detailed setup instructions
- CODE_OVERVIEW.md - Understand the codebase structure
- LICENSE.md - Licensing terms and conditions
- CREDITS.md - Third-party libraries and acknowledgments
Additional Docs
- SURYA_OCR_NODE_GUIDE.md - Surya OCR detailed guide
- PaddleOCR_VL_SETUP.md - PaddleOCR separate environment setup
- PDF_LAYER_DETECTION_GUIDE.md - PDF layer analysis
- BATCH_PROCESSING_GUIDE.md - Batch workflow tips
š§ Core Dependencies
Auto-installed with install.ps1:
- PyMuPDF (fitz) - PDF processing and rendering
- Pillow - Image processing and manipulation
- numpy - Array operations and numerical computing
- opencv-python - Computer vision operations
- transformers - Hugging Face AI models
- torch - PyTorch for deep learning
- surya-ocr - Advanced OCR engine
- paddleocr - Chinese/multilingual OCR (basic version)
- layoutparser - Document layout analysis
Note: PaddleOCR VL requires a separate virtual environment due to CUDA version conflicts. See PaddleOCR_VL_SETUP.md for setup instructions.
See requirements.txt for complete list.
š Project Structure
PDF_tools/
āāā nodes/ # ComfyUI node implementations
ā āāā pdf_extractor_v08.py # Advanced PDF extraction
ā āāā surya_ocr_layout_node.py # Surya OCR
ā āāā eric-florence2-cropper-node.py # Florence-2 vision
ā āāā enhanced_layout_parser_v06.py # Layout analysis
āāā florence2_scripts/ # Florence-2 AI vision models
āāā sam2_scripts/ # SAM2 segmentation models
āāā tools/ # Utility scripts
āāā Docs/ # Comprehensive documentation
āāā __init__.py # Node registration
š Troubleshooting
"Module not found" errors
Run the check script: .\check_install.ps1
"CUDA out of memory"
- Close other GPU applications
- Process fewer pages at once
- Use CPU mode (slower but works)
OCR accuracy issues
- Ensure image is high resolution (300+ DPI)
- Check language settings match document
- Try different OCR nodes for comparison
PDF extraction produces no images
- Verify PDF contains raster images (not just text)
- Check PDF isn't encrypted or password-protected
- Try Simple PDF Extractor for troubleshooting
See INSTALLATION_GUIDE.md for more troubleshooting.
šÆ Best Practices
- High-Quality Inputs - Use 300+ DPI scans for best OCR results
- Enable Quality Assessment - Let the tool filter low-quality extractions
- Batch Process - Process multiple documents in one workflow
- Export Metadata - Save JSON outputs for downstream processing
- GPU Acceleration - Use CUDA for 10x faster inference with AI models
š Version Info
Current versions:
- PyMuPDF: 1.26.4+
- Transformers: 4.55.0+
- Torch: 2.7.1+cu128
- Surya-OCR: Latest from GitHub
- Florence-2: Microsoft Research
š License
Copyright (c) 2025 Eric Hiss. All rights reserved.
Dual-licensed:
- Non-Commercial Use: Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0)
- Commercial Use: Requires separate license - contact [email protected]
Important: This project uses third-party libraries with various licenses (GPL, AGPL, MIT, Apache). See CREDITS.md for complete dependency licensing.
š¤ Contributing
Contributions welcome! See CONTRIBUTING.md for:
- Code style guidelines
- Testing requirements
- Pull request process
- Development setup
š„ Contact & Support
- Author: Eric Hiss
- GitHub: EricRollei
- Email: [email protected], [email protected]
- Issues: Open an issue on GitHub for bugs or feature requests
š Acknowledgments
Special thanks to:
- ComfyUI community for the amazing extensible platform
- Microsoft Research for Florence-2 vision models
- Vikp for Surya OCR
- Meta AI for SAM2 segmentation models
- Hugging Face for model hosting and transformers library
- All open-source developers whose work makes this possible
See CREDITS.md for detailed acknowledgments.
Ready to process documents! Install dependencies, restart ComfyUI, and start extracting.