OCR PDF

Extract text from scanned or image-based PDFs using Tesseract.js — runs entirely in your browser.

Runs in your browser — no server upload

OCR is performed by Tesseract.js, a WebAssembly port of the Tesseract engine. On first use it downloads ~4 MB of English language data from a CDN. Your PDF is never sent anywhere. Processing is slow — expect 10–30 seconds per page.

Drag & drop your PDF file here

or click to browse

Supported format: PDF

Your file never leaves your browser

Scanned PDF Support

Extracts text from image-based or scanned PDFs that have no embedded text layer.

Tesseract.js Engine

Powered by Tesseract — a proven open-source OCR engine compiled to WebAssembly for browser use.

100% Private

Your PDF is rendered locally. Nothing is sent to any server. OCR data stays on your device.

How to OCR a PDF

1

Upload Your PDF

Upload a scanned or image-based PDF that has no selectable text.

2

Start OCR

Click Start OCR. Each page is rendered then processed with Tesseract.js in your browser.

3

Download the Text

Preview the extracted text and download it as a .txt file.