How to Extract Text from PDF — Copy Text from Any PDF (Free)
You need one paragraph from a PDF report. You try to select the text, but the cursor just drags a blue box around the page. Or you manage to select something, paste it into a text editor, and get a wall of garbled characters, broken line breaks, and symbols that were never in the original document. If the PDF is a scanned document, there is no text to select at all — just a flat image of words you can see but cannot touch.
Extracting text from a PDF should be simple, but the format was never designed for easy text extraction. This guide covers the actual methods that work, starting with the fastest approach and moving to alternatives for trickier situations like scanned documents.
Why Some PDFs Won't Let You Copy Text
Before jumping to solutions, it helps to understand why this problem exists in the first place. There are three main reasons a PDF might resist your attempts to copy text from it.
1. The PDF is a scanned image
When someone scans a paper document or saves a photo as a PDF, the result is an image wrapped inside a PDF container. The file looks like it contains text because you can read the words with your eyes, but to the computer it is just a grid of pixels. There is no text layer, no font information, and no character data to extract. You are essentially trying to copy text from a photograph.
2. Font encoding issues
Some PDFs use custom font encodings or embedded subsets where the character mapping does not follow standard Unicode. The PDF viewer renders the correct glyphs on screen because it uses the embedded font, but when you copy the text, your operating system tries to map those characters back to standard text and fails. This is why you sometimes get output like "fi" turning into a single unknown character, or entire words becoming strings of question marks and boxes.
3. Copy protection and permissions
PDF supports permission flags that can restrict text selection and copying. A document creator can set these flags when generating the PDF, and most PDF readers will honor them by disabling the selection tool. The text is technically there in the file structure, but the viewer refuses to let you access it. Some tools can bypass these restrictions since the text data itself is not encrypted — only the permission flag is set.
Method 1: Extract Text with AllPDF.tools PDF to Text
The fastest way to extract text from a PDF is to use a tool that reads the PDF's internal text layer directly. AllPDF.tools has a dedicated PDF to Text tool built on PDF.js — the same rendering engine Firefox uses to display PDFs.
How it works, step by step
- Open the tool. Go to AllPDF.tools PDF to Text in any browser — desktop or mobile.
- Load your PDF. Click the upload area or drag and drop your PDF file. The file stays in your browser and is never uploaded to any server.
- Wait for extraction. The tool parses every page of the PDF and pulls out the text content. For a typical 20-page document, this takes a few seconds.
- Copy or download. The extracted text appears on screen. You can copy it directly to your clipboard or download it as a plain text file.
What makes this approach work well
- Preserves paragraph structure. The tool reads text in the correct order and maintains line breaks between paragraphs, so you don't get a continuous wall of text.
- Handles multi-page PDFs. It processes every page in the document, not just the first one. You get the full text content in one pass.
- Works around permission flags. Since it reads the raw PDF structure directly, it can extract text even from PDFs that have copy-protection flags set.
- Completely private. Your file never leaves your device. All processing happens in JavaScript running in your browser tab.
Open PDF to Text Tool
Method 2: Select and Copy in a PDF Reader
The most obvious approach is the one you have probably already tried: open the PDF in a reader and select the text with your mouse. This works in some situations and is worth covering because when it does work, it is the simplest option.
In Adobe Acrobat Reader
Open the PDF, click the text selection tool (the cursor icon in the toolbar), then click and drag to highlight the text you want. Press Ctrl+C (or Cmd+C on Mac) to copy. If you need all the text, use Ctrl+A to select everything on the current page.
In your web browser
Chrome, Firefox, and Edge all have built-in PDF viewers. Open the PDF in a new tab, then select text as you would on any web page. Browser PDF viewers use PDF.js internally and sometimes handle text extraction better than dedicated PDF apps for certain documents.
When this method fails
Manual selection breaks down in several common scenarios. Tables extract as a jumbled mess because the reader does not understand the column structure. Multi-column layouts often merge columns together. Headers and footers get mixed into the body text. And if you need text from a 50-page document, selecting page by page is painfully slow.
If selecting and copying gives you garbled output or misses content, switch to Method 1. A dedicated text extraction tool reads the PDF structure more accurately than the selection tool in most readers.
Method 3: Google Drive OCR Trick (for Scanned PDFs)
If your PDF is a scanned document — meaning it contains images of text rather than actual text data — you need OCR to convert those images into machine-readable text. Google Drive offers a surprisingly effective free OCR option that most people don't know about.
Step-by-step process
- Upload the PDF to Google Drive. Go to drive.google.com and drag your PDF file into any folder.
- Right-click the uploaded PDF and select "Open with" then "Google Docs."
- Wait for conversion. Google will process the PDF with its OCR engine and open it as an editable Google Doc. This can take a minute for longer documents.
- Copy the text. The Google Doc now contains the recognized text. Select all (Ctrl+A) and copy, or download as a .txt or .docx file.
Limitations to know about
- Formatting is often lost or mangled — tables, columns, and complex layouts rarely survive the conversion intact.
- OCR accuracy depends on the scan quality. Blurry scans, low-resolution images, or unusual fonts will produce errors in the extracted text.
- There is a file size limit of around 2 MB for OCR processing through Google Docs. Larger files may not convert properly.
- Your PDF is uploaded to Google's servers, so this is not suitable for confidential or sensitive documents.
Regular PDFs vs. Scanned PDFs: How to Tell the Difference
The method you need depends entirely on what type of PDF you have. Here is how to quickly determine which kind you are dealing with.
The quick test
Open the PDF in any viewer and try to click on a word. If you can place your cursor between letters and highlight individual words, the PDF has an embedded text layer. Use Method 1 or Method 2 to extract the text.
If clicking anywhere just selects the entire page as a single image block, or if the selection tool does nothing at all, you have a scanned/image-only PDF. You need OCR — use Method 3 or a dedicated OCR application.
The zoom test
Zoom in to 400% or higher. In a regular PDF with real text, the letters stay perfectly sharp at any zoom level because they are rendered from font data. In a scanned PDF, the text becomes pixelated and blurry at high zoom because you are looking at an image.
The gray area: PDFs with partial text layers
Some PDFs are a mix of both. A scanned document might have gone through OCR before you received it, adding an invisible text layer on top of the page images. These hybrid PDFs look like scans when you zoom in, but text selection works because the OCR layer is present. Method 1 works well with these since it reads the text layer directly, regardless of whether the visible content is an image.
What to Do with Extracted Text
Raw text extracted from a PDF usually needs some cleanup before it is actually usable. Here is what to expect and how to handle it.
Fixing line breaks
PDF text extraction often inserts a line break at the end of every visual line in the document, even in the middle of paragraphs. If you paste the extracted text into Word or Google Docs and see short lines that don't wrap properly, do a find-and-replace: search for a single line break (manual line break or \n depending on the editor) and replace it with a space. Then go through and re-add paragraph breaks where they belong.
Cleaning up headers and footers
Page numbers, headers, and footers from the PDF will appear in the extracted text, usually interrupting the flow of the content. You will need to manually remove these. For long documents, a find-and-replace for the repeating header or footer text can speed this up considerably.
Importing into Word or Google Docs
Once the text is cleaned up, paste it into your target application. If you want to preserve some structure, paste into Google Docs first, then apply heading styles and formatting manually. For large documents, consider pasting section by section rather than dumping 50 pages of raw text at once.
Handling tables
Tables are the hardest content to extract cleanly from PDFs. The text extraction process loses the grid structure, so table data comes out as a flat sequence of cell values. For simple tables, you can often reconstruct them by pasting the text into a spreadsheet and adjusting columns. For complex tables, consider using a dedicated PDF-to-Excel tool or retyping the data manually if the table is small.
Extract Text — Free & Private
Common Questions About PDF Text Extraction
Can I extract text from a password-protected PDF?
It depends on the type of protection. If the PDF has an owner password that restricts copying but does not require a password to open, most text extraction tools (including AllPDF.tools) can still read the text layer. If the PDF requires a password just to open it, you will need to enter the password first or remove the encryption before extracting text.
Why does my extracted text have weird characters?
This usually happens because the PDF uses a custom or subset font encoding that does not map cleanly to Unicode. Try a different extraction tool — PDF.js-based tools like AllPDF.tools PDF to Text often handle these encodings better than the basic copy-paste in a PDF reader. If the problem persists, the PDF may need OCR as a fallback even though it is not technically a scanned document.
Is there a way to extract text from a specific page only?
With AllPDF.tools, the text is extracted from all pages and displayed together. You can scroll to the section you need and copy just that portion. If you want to isolate specific pages first, use a PDF split tool to extract those pages into a separate file, then run the text extraction on the smaller file.
Does extracting text preserve formatting like bold, italic, and headings?
No. Text extraction produces plain text — it strips all formatting information. The output is just the raw text content without any bold, italic, font size, or heading hierarchy. If you need to preserve formatting, consider converting the PDF to a Word document (.docx) instead, which retains more structural information.
Can I extract text from a PDF on my phone?
Yes. AllPDF.tools works in any mobile browser — Chrome on Android, Safari on iPhone, or any other modern mobile browser. Open the PDF to Text tool, tap to select your file, and the extraction runs entirely on your device. No app installation needed.
What is the maximum file size for text extraction?
Since AllPDF.tools runs in your browser, the limit depends on your device's available memory rather than a server-imposed cap. Most devices handle PDFs up to 50-100 MB without issues. Very large files with hundreds of pages may take longer to process but will still work as long as your browser tab has enough memory.
Is my PDF uploaded to a server when I use AllPDF.tools?
No. Your file never leaves your device. The tool uses JavaScript running in your browser to parse the PDF and extract text locally. There is no server upload, no cloud processing, and no data stored anywhere. Once you close the tab, the file is gone from memory.