PDF OCR โ€” Extract Text from Scanned PDFs Free

Run optical character recognition on scanned or image-based PDFs directly in your browser. Each page is rendered by pdf.js and passed to Tesseract OCR. Copy text page by page or download the full result as a .txt file. Supports 16 languages. Nothing uploaded, no account needed.

100% Private โ€” No Uploads
16 Languages
Scanned & Image PDFs

Manage Projects Like a Pro in Excel ๐Ÿ“Š

Get our premium Excel Gantt Chart Template with automated dependencies.

Get 30% Off Now

What is a scanned PDF and why can't I copy text from it?

A scanned PDF is produced by scanning a physical document with a printer/scanner or photographing it with a phone. The result is a PDF that contains images of each page โ€” there is no underlying text layer. When you try to select text in Adobe Reader or your browser, nothing happens because the PDF reader sees pixels, not characters. OCR (Optical Character Recognition) solves this by analysing the image and identifying the characters, producing a text string you can copy, search, or process further.

How this tool works

When you drop a PDF, pdf.js (Mozilla's browser-based PDF renderer) renders each selected page to a canvas at 200 DPI โ€” high enough for accurate OCR. That canvas is then passed to Tesseract.js, a WebAssembly build of the Tesseract OCR engine originally developed by HP and now maintained by Google. Tesseract returns a text string for each page. The entire pipeline runs in your browser tab โ€” no data is sent to a server. The first run for a given language downloads a Tesseract language pack (~5โ€“15 MB, cached in your browser thereafter).

When will OCR quality be lower?

OCR accuracy depends on the quality of the original scan. Common causes of lower accuracy include: very low-resolution scans (below 150 DPI), skewed or rotated pages, handwritten text (Tesseract is trained on printed fonts), complex multi-column layouts, tables with thin borders, and heavily watermarked documents. For best results, use a clean scan of printed text at 200+ DPI and ensure the document language matches your language selection.

My PDF already has selectable text โ€” do I need this?

No. If you can already select and copy text from your PDF in a viewer, the PDF has a native text layer and OCR is unnecessary. This tool is designed for PDFs that contain only images of pages โ€” scanned documents, photographed contracts, image-export PDFs from tools that don't embed a text layer. For text-layer PDFs, use the PDF to Excel tool if you need to extract tables.

Are my PDFs uploaded to a server?

No. Both pdf.js and Tesseract.js run entirely in your browser. Your PDF is never sent to a server at any point. This matters for medical records, legal contracts, financial statements, government documents, and any other sensitive scanned document. When you close the tab, all data is gone from memory.