How to Extract Text from PDF — Copy Text from Any PDF (Free)

Updated April 7, 2026 · 5 min read

You need one paragraph from a PDF report. You try to select the text, but the cursor just drags a blue box around the page. Or you manage to select something, paste it into a text editor, and get a wall of garbled characters, broken line breaks, and symbols that were never in the original document. If the PDF is a scanned document, there is no text to select at all — just a flat image of words you can see but cannot touch.

Extracting text from a PDF should be simple, but the format was never designed for easy text extraction. This guide covers the actual methods that work, starting with the fastest approach and moving to alternatives for trickier situations like scanned documents.

Why Some PDFs Won't Let You Copy Text

Before jumping to solutions, it helps to understand why this problem exists in the first place. There are three main reasons a PDF might resist your attempts to copy text from it.

1. The PDF is a scanned image

When someone scans a paper document or saves a photo as a PDF, the result is an image wrapped inside a PDF container. The file looks like it contains text because you can read the words with your eyes, but to the computer it is just a grid of pixels. There is no text layer, no font information, and no character data to extract. You are essentially trying to copy text from a photograph.

2. Font encoding issues

Some PDFs use custom font encodings or embedded subsets where the character mapping does not follow standard Unicode. The PDF viewer renders the correct glyphs on screen because it uses the embedded font, but when you copy the text, your operating system tries to map those characters back to standard text and fails. This is why you sometimes get output like "fi" turning into a single unknown character, or entire words becoming strings of question marks and boxes.

3. Copy protection and permissions

PDF supports permission flags that can restrict text selection and copying. A document creator can set these flags when generating the PDF, and most PDF readers will honor them by disabling the selection tool. The text is technically there in the file structure, but the viewer refuses to let you access it. Some tools can bypass these restrictions since the text data itself is not encrypted — only the permission flag is set.

Method 1: Extract Text with AllPDF.tools PDF to Text

The fastest way to extract text from a PDF is to use a tool that reads the PDF's internal text layer directly. AllPDF.tools has a dedicated PDF to Text tool built on PDF.js — the same rendering engine Firefox uses to display PDFs.

How it works, step by step

  1. Open the tool. Go to AllPDF.tools PDF to Text in any browser — desktop or mobile.
  2. Load your PDF. Click the upload area or drag and drop your PDF file. The file stays in your browser and is never uploaded to any server.
  3. Wait for extraction. The tool parses every page of the PDF and pulls out the text content. For a typical 20-page document, this takes a few seconds.
  4. Copy or download. The extracted text appears on screen. You can copy it directly to your clipboard or download it as a plain text file.

What makes this approach work well

Important note about scanned PDFs: This tool works with PDFs that have an embedded text layer — which covers the vast majority of PDFs created from Word, Excel, web pages, or any digital source. If your PDF is a scanned image with no text layer, you will need OCR (optical character recognition) to convert the image to text. See Method 3 below for a free OCR workaround.
Extract text from your PDF right now — free, private, no signup
Open PDF to Text Tool

Method 2: Select and Copy in a PDF Reader

The most obvious approach is the one you have probably already tried: open the PDF in a reader and select the text with your mouse. This works in some situations and is worth covering because when it does work, it is the simplest option.

In Adobe Acrobat Reader

Open the PDF, click the text selection tool (the cursor icon in the toolbar), then click and drag to highlight the text you want. Press Ctrl+C (or Cmd+C on Mac) to copy. If you need all the text, use Ctrl+A to select everything on the current page.

In your web browser

Chrome, Firefox, and Edge all have built-in PDF viewers. Open the PDF in a new tab, then select text as you would on any web page. Browser PDF viewers use PDF.js internally and sometimes handle text extraction better than dedicated PDF apps for certain documents.

When this method fails

Manual selection breaks down in several common scenarios. Tables extract as a jumbled mess because the reader does not understand the column structure. Multi-column layouts often merge columns together. Headers and footers get mixed into the body text. And if you need text from a 50-page document, selecting page by page is painfully slow.

If selecting and copying gives you garbled output or misses content, switch to Method 1. A dedicated text extraction tool reads the PDF structure more accurately than the selection tool in most readers.

Method 3: Google Drive OCR Trick (for Scanned PDFs)

If your PDF is a scanned document — meaning it contains images of text rather than actual text data — you need OCR to convert those images into machine-readable text. Google Drive offers a surprisingly effective free OCR option that most people don't know about.

Step-by-step process

  1. Upload the PDF to Google Drive. Go to drive.google.com and drag your PDF file into any folder.
  2. Right-click the uploaded PDF and select "Open with" then "Google Docs."
  3. Wait for conversion. Google will process the PDF with its OCR engine and open it as an editable Google Doc. This can take a minute for longer documents.
  4. Copy the text. The Google Doc now contains the recognized text. Select all (Ctrl+A) and copy, or download as a .txt or .docx file.

Limitations to know about

Tip: If the Google Drive OCR output has errors, try improving the source first. If you have the original scanned image, increase the scan resolution to 300 DPI or higher and re-scan with good lighting before uploading.

Regular PDFs vs. Scanned PDFs: How to Tell the Difference

The method you need depends entirely on what type of PDF you have. Here is how to quickly determine which kind you are dealing with.

The quick test

Open the PDF in any viewer and try to click on a word. If you can place your cursor between letters and highlight individual words, the PDF has an embedded text layer. Use Method 1 or Method 2 to extract the text.

If clicking anywhere just selects the entire page as a single image block, or if the selection tool does nothing at all, you have a scanned/image-only PDF. You need OCR — use Method 3 or a dedicated OCR application.

The zoom test

Zoom in to 400% or higher. In a regular PDF with real text, the letters stay perfectly sharp at any zoom level because they are rendered from font data. In a scanned PDF, the text becomes pixelated and blurry at high zoom because you are looking at an image.

The gray area: PDFs with partial text layers

Some PDFs are a mix of both. A scanned document might have gone through OCR before you received it, adding an invisible text layer on top of the page images. These hybrid PDFs look like scans when you zoom in, but text selection works because the OCR layer is present. Method 1 works well with these since it reads the text layer directly, regardless of whether the visible content is an image.

What to Do with Extracted Text

Raw text extracted from a PDF usually needs some cleanup before it is actually usable. Here is what to expect and how to handle it.

Fixing line breaks

PDF text extraction often inserts a line break at the end of every visual line in the document, even in the middle of paragraphs. If you paste the extracted text into Word or Google Docs and see short lines that don't wrap properly, do a find-and-replace: search for a single line break (manual line break or \n depending on the editor) and replace it with a space. Then go through and re-add paragraph breaks where they belong.

Cleaning up headers and footers

Page numbers, headers, and footers from the PDF will appear in the extracted text, usually interrupting the flow of the content. You will need to manually remove these. For long documents, a find-and-replace for the repeating header or footer text can speed this up considerably.

Importing into Word or Google Docs

Once the text is cleaned up, paste it into your target application. If you want to preserve some structure, paste into Google Docs first, then apply heading styles and formatting manually. For large documents, consider pasting section by section rather than dumping 50 pages of raw text at once.

Handling tables

Tables are the hardest content to extract cleanly from PDFs. The text extraction process loses the grid structure, so table data comes out as a flat sequence of cell values. For simple tables, you can often reconstruct them by pasting the text into a spreadsheet and adjusting columns. For complex tables, consider using a dedicated PDF-to-Excel tool or retyping the data manually if the table is small.

Need to extract text from a PDF right now?
Extract Text — Free & Private

Common Questions About PDF Text Extraction

Can I extract text from a password-protected PDF?

It depends on the type of protection. If the PDF has an owner password that restricts copying but does not require a password to open, most text extraction tools (including AllPDF.tools) can still read the text layer. If the PDF requires a password just to open it, you will need to enter the password first or remove the encryption before extracting text.

Why does my extracted text have weird characters?

This usually happens because the PDF uses a custom or subset font encoding that does not map cleanly to Unicode. Try a different extraction tool — PDF.js-based tools like AllPDF.tools PDF to Text often handle these encodings better than the basic copy-paste in a PDF reader. If the problem persists, the PDF may need OCR as a fallback even though it is not technically a scanned document.

Is there a way to extract text from a specific page only?

With AllPDF.tools, the text is extracted from all pages and displayed together. You can scroll to the section you need and copy just that portion. If you want to isolate specific pages first, use a PDF split tool to extract those pages into a separate file, then run the text extraction on the smaller file.

Does extracting text preserve formatting like bold, italic, and headings?

No. Text extraction produces plain text — it strips all formatting information. The output is just the raw text content without any bold, italic, font size, or heading hierarchy. If you need to preserve formatting, consider converting the PDF to a Word document (.docx) instead, which retains more structural information.

Can I extract text from a PDF on my phone?

Yes. AllPDF.tools works in any mobile browser — Chrome on Android, Safari on iPhone, or any other modern mobile browser. Open the PDF to Text tool, tap to select your file, and the extraction runs entirely on your device. No app installation needed.

What is the maximum file size for text extraction?

Since AllPDF.tools runs in your browser, the limit depends on your device's available memory rather than a server-imposed cap. Most devices handle PDFs up to 50-100 MB without issues. Very large files with hundreds of pages may take longer to process but will still work as long as your browser tab has enough memory.

Is my PDF uploaded to a server when I use AllPDF.tools?

No. Your file never leaves your device. The tool uses JavaScript running in your browser to parse the PDF and extract text locally. There is no server upload, no cloud processing, and no data stored anywhere. Once you close the tab, the file is gone from memory.