OCR PDF
Extract text from scanned or image-based PDFs using Tesseract.js — runs entirely in your browser.
Runs in your browser — no server upload
OCR is performed by Tesseract.js, a WebAssembly port of the Tesseract engine. On first use it downloads ~4 MB of English language data from a CDN. Your PDF is never sent anywhere. Processing is slow — expect 10–30 seconds per page.
Drag & drop your PDF file here
or click to browse
Supported format: PDF
Your file never leaves your browser
Scanned PDF Support
Extracts text from image-based or scanned PDFs that have no embedded text layer.
Tesseract.js Engine
Powered by Tesseract — a proven open-source OCR engine compiled to WebAssembly for browser use.
100% Private
Your PDF is rendered locally. Nothing is sent to any server. OCR data stays on your device.
How to OCR a PDF
Upload Your PDF
Upload a scanned or image-based PDF that has no selectable text.
Start OCR
Click Start OCR. Each page is rendered then processed with Tesseract.js in your browser.
Download the Text
Preview the extracted text and download it as a .txt file.