OCR PDF

Extract text from scanned or image-based PDFs using Tesseract.js — runs entirely in your browser.

Runs in your browser — no server upload

OCR is performed by Tesseract.js, a WebAssembly port of the Tesseract engine. On first use it downloads ~4 MB of English language data from a CDN. Your PDF is never sent anywhere. Processing is slow — expect 10–30 seconds per page.

Drag & drop your PDF file here

or click to browse

Supported format: PDF

Your file never leaves your browser

Scanned PDF Support

Extracts text from image-based or scanned PDFs that have no embedded text layer.

Tesseract.js Engine

100% Private

Your PDF is rendered locally. Nothing is sent to any server. OCR data stays on your device.

How to OCR a PDF

Upload Your PDF

Upload a scanned or image-based PDF that has no selectable text.

Start OCR

Click Start OCR. Each page is rendered then processed with Tesseract.js in your browser.

Download the Text

Preview the extracted text and download it as a .txt file.

Related Tools

JPG to PDF

Convert images to PDF

PDF to JPG

Convert PDF pages to images

PDF to PNG

Convert PDF pages to PNG images

PNG to PDF

Convert PNG images to PDF