PDF OCR — Extract Text from Scanned PDFs Free

Run optical character recognition on scanned or image-based PDFs directly in your browser. Each page is rendered by pdf.js and passed to Tesseract OCR. Copy text page by page or download the full result as a .txt file. Supports 16 languages. Nothing uploaded, no account needed.

100% Private — No Uploads

16 Languages

Scanned & Image PDFs

Manage Projects Like a Pro in Excel 📊

Get our premium Excel Gantt Chart Template with automated dependencies.

Get 30% Off Now

Drag & drop a PDF

Scanned or image-based PDF · no upload

What is a scanned PDF and why can't I copy text from it?

A scanned PDF is produced by scanning a physical document with a printer/scanner or photographing it with a phone. The result is a PDF that contains images of each page — there is no underlying text layer. When you try to select text in Adobe Reader or your browser, nothing happens because the PDF reader sees pixels, not characters. OCR (Optical Character Recognition) solves this by analysing the image and identifying the characters, producing a text string you can copy, search, or process further.

How this tool works

When you drop a PDF, pdf.js (Mozilla's browser-based PDF renderer) renders each selected page to a canvas at 200 DPI — high enough for accurate OCR. That canvas is then passed to Tesseract.js, a WebAssembly build of the Tesseract OCR engine originally developed by HP and now maintained by Google. Tesseract returns a text string for each page. The entire pipeline runs in your browser tab — no data is sent to a server. The first run for a given language downloads a Tesseract language pack (~5–15 MB, cached in your browser thereafter).

When will OCR quality be lower?

OCR accuracy depends on the quality of the original scan. Common causes of lower accuracy include: very low-resolution scans (below 150 DPI), skewed or rotated pages, handwritten text (Tesseract is trained on printed fonts), complex multi-column layouts, tables with thin borders, and heavily watermarked documents. For best results, use a clean scan of printed text at 200+ DPI and ensure the document language matches your language selection.

My PDF already has selectable text — do I need this?

No. If you can already select and copy text from your PDF in a viewer, the PDF has a native text layer and OCR is unnecessary. This tool is designed for PDFs that contain only images of pages — scanned documents, photographed contracts, image-export PDFs from tools that don't embed a text layer. For text-layer PDFs, use the PDF to Excel tool if you need to extract tables.

Are my PDFs uploaded to a server?

No. Both pdf.js and Tesseract.js run entirely in your browser. Your PDF is never sent to a server at any point. This matters for medical records, legal contracts, financial statements, government documents, and any other sensitive scanned document. When you close the tab, all data is gone from memory.

More Productivity Tools

Explore our other privacy-focused tools designed to boost your productivity

PDF Password Remover

Unlock PDF files to print, edit, and copy — 100% private, no uploads

Try this tool

PDF Password Protector

Add AES-256 password protection and permission restrictions to PDFs — bulk, no uploads

Try this tool

PDF Compressor

Shrink PDF file size with quality presets and metadata stripping — 100% private, browser-only

Try this tool

View all tools