ComfyUI Extension: PDF Tools - Advanced PDF Processing & OCR

Authored by EricRollei

Created

Updated

1 stars

Advanced PDF processing, OCR, Image and text parsing, smart image crop nodes for ComfyUI. Features include multi-language OCR (Surya, PaddleOCR VL), AI vision analysis (Florence-2, LayoutLMv3), advanced PDF extraction with quality assessment, spread detection for scanned books, and comprehensive layout analysis. Supports 90+ languages with multiple output formats.

Custom Nodes (0)

    README

    PDF Tools - ComfyUI Custom Node Package

    Advanced PDF processing, OCR, and AI vision analysis nodes for ComfyUI.

    šŸ“¢ Important Notice: Package Split

    The download functionality has moved to a separate package:

    • PDF Tools (this package): PDF extraction, OCR, AI vision processing
    • Download Tools (new package): gallery-dl and yt-dlp downloaders

    If you need media download nodes, install the download-tools package separately:

    cd ComfyUI/custom_nodes/download-tools
    .\install.ps1
    

    šŸŽ‰ Quick Start

    Installation

    cd ComfyUI/custom_nodes/PDF_tools
    .\install.ps1
    

    Verify Installation

    .\check_install.ps1
    

    Start Using

    1. Restart ComfyUI
    2. Look for nodes under categories: PDF, OCR, Vision, Layout
    3. Start processing documents!

    šŸ“¦ Available Nodes

    PDF Extraction

    • PDF Extractor v08/v09 - Advanced image extraction with quality assessment
      • Automatic spread detection for scanned books
      • Image quality scoring (sharpness, contrast, brightness)
      • Duplicate detection
      • Organize output by quality
      • JSON metadata export
    • Simple PDF Extractor - Basic extraction without advanced features

    OCR (Optical Character Recognition)

    • Surya OCR Layout Node - State-of-the-art multilingual OCR
      • 90+ languages supported
      • Layout-aware text extraction
      • High accuracy on complex documents
      • GPU-accelerated inference
    • Surya Layout OCR Hybrid - Combined layout analysis + OCR
      • Single-step document processing
      • Preserves reading order
      • Handles multi-column layouts
    • PaddleOCR VL Remote - Specialized for Chinese/CJK documents
      • Excellent for Asian language texts
      • Remote processing capabilities
      • Requires separate virtual environment (see PaddleOCR_VL_SETUP.md)
      • Runs as standalone service due to CUDA version conflicts

    Layout Analysis

    • Enhanced Layout Parser v06 - Advanced document understanding
      • Detects titles, paragraphs, tables, figures, lists
      • Hierarchical structure extraction
      • Reading order detection
      • Bounding box coordinates
    • LayoutLMv3 Node - Microsoft's document AI model
      • Multi-modal document understanding
      • Form and receipt processing
      • Table structure recognition

    AI Vision & Object Detection

    • Florence2 Rectangle Detector - Microsoft Florence-2 vision model
      • Object detection with bounding boxes
      • Image captioning (simple & detailed)
      • Visual question answering
      • OCR and text detection
      • Region-specific descriptions
    • Florence2 Cropper Node - Crop based on detections
      • Automatic image region extraction
      • Batch processing of detected objects

    šŸš€ Key Features

    āœ… Smart PDF Extraction - Quality scoring, spread detection, duplicate removal
    āœ… Multilingual OCR - 90+ languages with Surya, Chinese/Japanese with PaddleOCR
    āœ… Layout Understanding - Detect document structure (titles, paragraphs, tables)
    āœ… AI Vision Models - Florence-2 for object detection and image analysis
    āœ… Batch Processing - Process multiple documents efficiently
    āœ… GPU Acceleration - Fast inference with CUDA support
    āœ… Quality Assessment - Automatic image quality evaluation
    āœ… JSON Export - Structured metadata for all extractions

    šŸ’” Usage Examples

    Extract High-Quality Images from PDF

    Node: PDF Extractor v08
    ā”œā”€ā”€ Input PDF: "mybook.pdf"
    ā”œā”€ā”€ Output Folder: "./extracted_images"
    ā”œā”€ā”€ Options:
    │   ā”œā”€ā”€ āœ“ quality_assessment (score each image)
    │   ā”œā”€ā”€ āœ“ spread_detection (detect 2-page spreads)
    │   ā”œā”€ā”€ āœ“ organize_by_quality (high/medium/low folders)
    │   └── āœ“ save_json_output (metadata file)
    └── Result: Images sorted by quality with detailed metrics
    

    OCR a Scanned Document

    Node: Surya OCR Layout Node
    ā”œā”€ā”€ Input: "scanned_page.png"
    ā”œā”€ā”€ Languages: ["en"] or ["en", "es", "fr"]
    ā”œā”€ā”€ Output:
    │   ā”œā”€ā”€ Extracted text with 95%+ accuracy
    │   ā”œā”€ā”€ Bounding boxes for each word/line
    │   └── Layout information (columns, paragraphs)
    

    Detect Objects in Images

    Node: Florence2 Rectangle Detector
    ā”œā”€ā”€ Input Image: "photo.jpg"
    ā”œā”€ā”€ Task: <OD> (Object Detection)
    ā”œā”€ā”€ Output:
    │   ā”œā”€ā”€ Bounding boxes for detected objects
    │   ā”œā”€ā”€ Labels (e.g., "person", "car", "dog")
    │   └── Confidence scores
    

    Analyze Document Layout

    Node: Enhanced Layout Parser v06
    ā”œā”€ā”€ Input: PDF page or image
    ā”œā”€ā”€ Output:
    │   ā”œā”€ā”€ Regions: title, text, table, figure, list
    │   ā”œā”€ā”€ Bounding box coordinates
    │   ā”œā”€ā”€ Hierarchical structure
    │   └── Reading order
    

    šŸ”§ System Requirements

    • OS: Windows 10/11 (primary), Linux compatible
    • Python: 3.10+ (included with ComfyUI)
    • GPU: NVIDIA with CUDA recommended (CPU works but slower)
    • RAM: 8GB minimum, 16GB+ recommended for AI models
    • Storage: 5-10GB for packages + models

    šŸ“š Documentation

    Main Guides

    Additional Docs

    šŸ”§ Core Dependencies

    Auto-installed with install.ps1:

    • PyMuPDF (fitz) - PDF processing and rendering
    • Pillow - Image processing and manipulation
    • numpy - Array operations and numerical computing
    • opencv-python - Computer vision operations
    • transformers - Hugging Face AI models
    • torch - PyTorch for deep learning
    • surya-ocr - Advanced OCR engine
    • paddleocr - Chinese/multilingual OCR (basic version)
    • layoutparser - Document layout analysis

    Note: PaddleOCR VL requires a separate virtual environment due to CUDA version conflicts. See PaddleOCR_VL_SETUP.md for setup instructions.

    See requirements.txt for complete list.

    šŸ“ Project Structure

    PDF_tools/
    ā”œā”€ā”€ nodes/              # ComfyUI node implementations
    │   ā”œā”€ā”€ pdf_extractor_v08.py      # Advanced PDF extraction
    │   ā”œā”€ā”€ surya_ocr_layout_node.py  # Surya OCR
    │   ā”œā”€ā”€ eric-florence2-cropper-node.py  # Florence-2 vision
    │   └── enhanced_layout_parser_v06.py   # Layout analysis
    ā”œā”€ā”€ florence2_scripts/  # Florence-2 AI vision models
    ā”œā”€ā”€ sam2_scripts/       # SAM2 segmentation models
    ā”œā”€ā”€ tools/              # Utility scripts
    ā”œā”€ā”€ Docs/               # Comprehensive documentation
    └── __init__.py         # Node registration
    

    šŸ› Troubleshooting

    "Module not found" errors

    Run the check script: .\check_install.ps1

    "CUDA out of memory"

    • Close other GPU applications
    • Process fewer pages at once
    • Use CPU mode (slower but works)

    OCR accuracy issues

    • Ensure image is high resolution (300+ DPI)
    • Check language settings match document
    • Try different OCR nodes for comparison

    PDF extraction produces no images

    • Verify PDF contains raster images (not just text)
    • Check PDF isn't encrypted or password-protected
    • Try Simple PDF Extractor for troubleshooting

    See INSTALLATION_GUIDE.md for more troubleshooting.

    šŸŽÆ Best Practices

    1. High-Quality Inputs - Use 300+ DPI scans for best OCR results
    2. Enable Quality Assessment - Let the tool filter low-quality extractions
    3. Batch Process - Process multiple documents in one workflow
    4. Export Metadata - Save JSON outputs for downstream processing
    5. GPU Acceleration - Use CUDA for 10x faster inference with AI models

    šŸ“ Version Info

    Current versions:

    • PyMuPDF: 1.26.4+
    • Transformers: 4.55.0+
    • Torch: 2.7.1+cu128
    • Surya-OCR: Latest from GitHub
    • Florence-2: Microsoft Research

    šŸ“„ License

    Copyright (c) 2025 Eric Hiss. All rights reserved.

    Dual-licensed:

    Important: This project uses third-party libraries with various licenses (GPL, AGPL, MIT, Apache). See CREDITS.md for complete dependency licensing.

    šŸ¤ Contributing

    Contributions welcome! See CONTRIBUTING.md for:

    • Code style guidelines
    • Testing requirements
    • Pull request process
    • Development setup

    šŸ‘„ Contact & Support

    šŸ™ Acknowledgments

    Special thanks to:

    • ComfyUI community for the amazing extensible platform
    • Microsoft Research for Florence-2 vision models
    • Vikp for Surya OCR
    • Meta AI for SAM2 segmentation models
    • Hugging Face for model hosting and transformers library
    • All open-source developers whose work makes this possible

    See CREDITS.md for detailed acknowledgments.


    Ready to process documents! Install dependencies, restart ComfyUI, and start extracting.