ComfyUI Extension: ComfyUI-DeepseekOCR
A custom node that wraps DeepSeek-OCR as a ComfyUI plugin, providing powerful OCR recognition and document parsing capabilities.
Custom Nodes (0)
README
ComfyUI-DeepseekOCR
A custom node that wraps DeepSeek-OCR as a ComfyUI plugin, providing powerful OCR recognition and document parsing capabilities.
Features
Quick Start
cd ComfyUI/custom_nodes/
git clone https://github.com/Geo1230/ComfyUI-DeepseekOCR.git
cd ComfyUI-DeepseekOCR
Install Dependencies
pip install -r requirements.txt
Recommended transformers 4.46.3 If you encounter compatibility issues with transformers 4.55+, downgrade:
pip install transformers==4.46.3 tokenizers==0.20.3
Download Model
Create directories and navigate:
# 1. Navigate to ComfyUI's models directory
cd ComfyUI\models
# 2. Create deepseek-ocr directory (if it doesn't exist)
mkdir deepseek-ocr
cd deepseek-ocr
# 3. Create model directory
mkdir deepseek-ai_DeepSeek-OCR
cd deepseek-ai_DeepSeek-OCR
Download model to current directory:
huggingface-cli download deepseek-ai/DeepSeek-OCR --local-dir . --repo-type model
Note: Model will be downloaded to ComfyUI\models\deepseek-ocr\deepseek-ai_DeepSeek-OCR\ directory
Or Use Automatic Download (Not recommended, less stable):
Model will automatically download on first run of the Load node. Download progress is shown in the console.
To disable automatic download, set environment variable:
# Windows PowerShell
$env:DPSK_AUTODOWNLOAD = "0"
Usage
Node 1: DeepSeek OCR: Load Model
Loads and caches the model, outputs a model handle for use by the Run node.
Parameters:
dtype: Data precisionbf16(Recommended, default) - Balance of precision and performancefp16- Use when VRAM is insufficientfp32- Best compatibility but high VRAM usage
device: Runtime device (default:cuda)
Node 2: DeepSeek OCR: Run
Performs OCR inference and outputs recognized text.
Parameters:
model: Model handle (from Load node)image: Input image (ComfyUI IMAGE type)task: Task modeFree OCR: General OCR recognitionConvert to Markdown: Document to Markdown conversionParse Figure: Parse charts and figuresLocate by Reference: Locate specified objects (requiresreference_text)
resolution: Resolution presetGundam(Recommended for long documents): 1024/640/crop/compressTiny: 512x512Small: 640x640Base: 1024x1024Large: 1280x1280
output_type: Output type (determines what is returned)all(default): Output both text and visualization imagetext: Text only, image output is original imageimage: Visualization image only (suitable for Locate task)
reference_text: (Optional) Only when task=Locate by Reference, description of object to locatebox_color: (Optional) Detection box color, defaultred- Preset colors:
red,green,blue,yellow,cyan,magenta,white,black - Custom RGB: e.g.,
"255,0,0"(red),"0,255,0"(green)
- Preset colors:
box_width: (Optional) Detection box width, default2px, range 1-10
Outputs:
text: Recognized text content (STRING)- Contains original markers (e.g.,
<|ref|>...<|/ref|><|det|>[[coordinates]]<|/det|>)
- Contains original markers (e.g.,
visualization: Visualization image (IMAGE)- Locate by Reference task: Image with custom-styled bounding boxes
- Other tasks: Returns original input image
Screenshots
Usage Guide
π‘ Output Type Selection
all(default): Output both text and visualization imagetext: Text only (OCR/Markdown conversion)image: Visualization image only (Locate task)
π― Locate by Reference Task
Parameter Configuration:
task: SelectLocate by Referencereference_text: Enter the object to locate- Chinese examples:
"δ»·ζ Ό","ζ ι’","δΊη»΄η " - English examples:
"the teacher","price","table","logo"
- Chinese examples:
π¨ Custom Bounding Box Style
Supported Preset Colors (16 types):
| Color Name | RGB | Preview | Color Name | RGB | Preview |
|------------|-----|---------|------------|-----|---------|
| red | 255,0,0 | π΄ Red (default) | orange | 255,165,0 | π Orange |
| green | 0,255,0 | π’ Green | purple | 128,0,128 | π£ Purple |
| blue | 0,0,255 | π΅ Blue | pink | 255,192,203 | π©· Pink |
| yellow | 255,255,0 | π‘ Yellow | lime | 0,255,0 | π’ Lime |
| cyan | 0,255,255 | π΅ Cyan | navy | 0,0,128 | π΅ Navy |
| magenta | 255,0,255 | π£ Magenta | teal | 0,128,128 | π΅ Teal |
| white | 255,255,255 | βͺ White | gold | 255,215,0 | π‘ Gold |
| black | 0,0,0 | β« Black | silver | 192,192,192 | βͺ Silver |
Custom RGB Format:
- Input format:
"R,G,B"(e.g.,"255,128,0"for dark orange) - Range: 0-255
Box Width:
box_width: 1-10 pixels (default 2px)
Example Configuration:
box_color = "red" β Red 2px border (default)
box_color = "orange" β Orange border
box_color = "255,105,180" β Hot pink border
box_width = 5 β 5px thick border
π Basic Workflow
LoadImage
β
DeepSeek OCR: Load Model
β
DeepSeek OCR: Run
βββ text β Display Text / Save Text
βββ visualization β Preview Image / Save Image
π Typical Use Cases
1. Document to Markdown
task = "Convert to Markdown"
resolution = "Gundam"
β Output formatted Markdown text
2. Figure Parsing
task = "Parse Figure"
resolution = "Base"
β Extract structured data from tables and charts
3. Object Localization
task = "Locate by Reference"
reference_text = "εε¦Aζ’¦"
box_color = "red"
box_width = 2
β Text contains coordinates, image shows red box annotations
ComfyUI/
ββ models/
β ββ deepseek-ocr/ # β Fixed weights directory
β ββ deepseek-ai_DeepSeek-OCR/ # Model weights
β ββ hf_cache/ # HuggingFace cache
ββ output/
β ββ DeepseekOCR/ # Output directory (visualization results)
β ββ 2025-11-05_20-31-00/ # Timestamp directory
ββ log/
β ββ deepseek_ocr.log # Plugin logs
ββ custom_nodes/
ββ ComfyUI-DeepseekOCR/
ββ __init__.py
ββ config.py
ββ model_manager.py
ββ nodes.py
ββ resolver.py
ββ io_utils.py
ββ tool/
β ββ download_weights.py
ββ requirements.txt
ββ README.md
Logging
Plugin logs are located at: ComfyUI/log/deepseek_ocr.log
Key log contents:
- Model weight download progress
- Model loading status (device/dtype/attn_impl)
- Cache hit information
- Fallback strategy trigger records
- Error details and suggestions
This project is licensed under the MIT License. See the LICENSE file for details.
Acknowledgments
- DeepSeek AI - For providing the powerful DeepSeek-OCR model
- ComfyUI - Excellent node-based UI framework
- All contributors and users