How to Convert PDF to Excel (XLSX) — Extract Tables Free
You have a PDF with a table full of numbers — a bank statement, an invoice, a quarterly earnings report, a government budget document — and you need those numbers in Excel so you can actually work with them. Sort them. Filter them. Build formulas. Run a pivot table. But the moment you try to copy that table out of a PDF, everything falls apart. Columns misalign, rows merge into single lines, and numbers that looked perfectly formatted in the PDF become an unstructured mess in your spreadsheet.
This is one of the most common frustrations in everyday office work. PDFs are designed to preserve visual layout, not data structure. They are excellent for sharing documents that look the same on every screen, but they are terrible for data extraction. If you have ever spent an hour manually retyping numbers from a PDF into Excel, you know the pain. This guide covers exactly why PDF-to-Excel conversion is hard, and four practical methods to get it done — starting with the fastest free option.
Why PDF Tables Are So Hard to Extract
Before jumping into solutions, it helps to understand why this problem exists in the first place. It is not a limitation of your tools — it is a fundamental design problem with the PDF format itself.
PDFs do not store table structure. When you look at a table in a PDF, you see rows, columns, headers, and cell borders. But internally, the PDF file contains none of that. What it actually stores is a collection of text fragments, each placed at specific X,Y coordinates on the page. The word "Revenue" might be at position (72, 340), the number "1,245,000" at (310, 340), and so on. There is no metadata saying "these items belong to the same row" or "this is column 3." The visual appearance of a table is reconstructed by your PDF viewer — the file itself has no concept of rows or columns.
Grid lines are decorative, not structural. Those neat lines separating rows and columns in a PDF table? They are just drawn lines — vector graphics placed on the page for visual clarity. They carry zero semantic meaning. A PDF renderer draws them the same way it draws any other line or shape. Removing the grid lines would not change the text positions at all. This means extraction tools cannot simply "read the grid" — they have to infer the table structure from text positions alone.
Multi-page tables break unpredictably. When a table spans two or more pages, the PDF treats each page as an independent canvas. Headers may or may not repeat. Row heights can shift. The continuation of a table on page two is, from the PDF's perspective, an entirely new set of text fragments with no connection to the table on page one. Stitching these fragments back together into a coherent spreadsheet requires intelligent analysis of text alignment patterns across pages.
Fonts and encoding add complexity. Some PDFs use custom font encodings where the character "1" might be stored as a completely different code point internally. Others embed fonts that map characters to glyph indices rather than Unicode values. This can cause extraction tools to pull out gibberish instead of recognizable text — especially with older PDFs or those generated by specialized accounting software.
Method 1: AllPDF.tools PDF to Excel Converter (Free, No Upload)
The fastest way to extract tables from a PDF into an Excel spreadsheet is to use the AllPDF.tools PDF to Excel converter. It runs entirely in your browser — your file never leaves your computer — and handles most common table layouts automatically.
Step-by-Step Instructions
- Open the tool. Go to the PDF to Excel page on AllPDF.tools. No account or installation needed.
- Upload your PDF. Click the upload area or drag and drop your file. The tool reads the PDF entirely in your browser using JavaScript — nothing is sent to any server.
- Automatic table detection. The tool scans every page and extracts text items with their X,Y positions. It then groups text into rows based on Y-position proximity (items at similar vertical positions belong to the same row) and detects columns by clustering X positions (items that share similar horizontal positions form a column). This mimics how a human reads a table — left to right, top to bottom.
- Preview the extracted table. Before downloading, you can see the extracted data in a table preview. Check that columns aligned correctly and rows did not merge unexpectedly.
- Download as .xlsx. Click the download button to get a proper Excel file. Each page of the PDF becomes a separate sheet in the workbook, so multi-page documents stay organized. Numeric values are automatically converted from text strings to actual numbers in Excel, so formulas and sorting work immediately.
Open PDF to Excel Converter
Method 2: Copy and Paste into Excel
For small, simple tables — say, under 20 rows with cleanly separated columns — the brute-force approach sometimes works well enough.
- Open the PDF in any reader (Adobe Reader, Chrome's built-in viewer, Preview on Mac).
- Select the table area by clicking and dragging. Try to select only the table, not surrounding text.
- Copy the selection (Ctrl+C or Cmd+C).
- Open Excel and paste into cell A1 (Ctrl+V or Cmd+V).
- Use Excel's Text to Columns feature (Data tab) if everything landed in a single column. Split by spaces or fixed width, depending on the data.
When this works: Simple tables with consistent spacing, no merged cells, and short text values. Government data tables, basic price lists, and simple schedules often paste reasonably well.
When this fails: Most of the time, honestly. Multi-line cell content gets split across rows. Columns misalign because PDF viewers select text in reading order, not tabular order. Numbers with spaces (like "1 245 000") get split into separate cells. If you have more than a handful of rows, the cleanup time quickly exceeds the time it would take to use a proper conversion tool.
Method 3: Adobe Acrobat Pro Export
Adobe Acrobat Pro (the paid version, not the free Reader) includes an export feature that converts PDFs to Excel format. Since Adobe created the PDF format, their parser understands PDF internals better than almost any other tool.
- Open the PDF in Adobe Acrobat Pro.
- Go to File > Export a PDF (or use the "Export PDF" tool in the right panel).
- Select Spreadsheet > Microsoft Excel Workbook (.xlsx).
- Click Export and choose where to save the file.
Advantages: Acrobat Pro handles complex table layouts better than most tools. It recognizes merged cells, preserves some formatting (bold, colors), and handles multi-page tables more reliably. For PDFs with irregular layouts — nested tables, mixed text-and-table pages, or tables with heavy formatting — Acrobat Pro generally produces the cleanest output.
Disadvantages: It costs $19.99/month (or more with a full Creative Cloud subscription). That is a steep price if you only need to convert a few PDFs. The export also requires uploading your file to Adobe's cloud servers for processing, which may be a concern for sensitive financial documents. And even Acrobat Pro is not perfect — you will still need to review and fix the output for complex documents.
Method 4: Tabula (Free, Open-Source Desktop App)
Tabula is a free, open-source desktop application built specifically for extracting tables from PDFs. It was created by journalists who were tired of manually extracting data from government PDFs, and it remains one of the most respected tools in the data journalism community.
- Download Tabula from tabula.technology and install it. It requires Java to be installed on your system.
- Launch Tabula — it opens in your web browser as a local web application.
- Upload your PDF. Tabula processes it locally (nothing is sent to the internet).
- Draw a selection box around the table you want to extract. You can select multiple tables on the same page.
- Choose your extraction method: Stream (for tables without grid lines, uses text positioning) or Lattice (for tables with clear grid lines, uses line detection).
- Preview the results and export as CSV or TSV. Open the CSV in Excel.
Advantages: Completely free and open-source. Runs locally — excellent for sensitive documents. The manual selection approach gives you precise control over which parts of the page to extract. The Lattice mode is particularly good at tables with clear borders. Widely used in data journalism, academia, and government transparency work.
Disadvantages: Requires Java, which many users do not have installed and which can be cumbersome to set up. Exports only CSV, not native XLSX — you lose the ability to have multiple sheets and need an extra step to open in Excel. Does not handle scanned PDFs (no OCR). The interface, while functional, has not been updated significantly in years. Batch processing multiple PDFs requires the command-line version (tabula-java), which is less accessible for non-technical users.
Tips for Getting the Best Results
Regardless of which method you use, these practical tips will save you time and frustration:
- PDFs with clean grid lines produce the best results. Tables that have visible borders around every cell give extraction algorithms the most to work with. If you are generating PDFs yourself (from Excel, Google Sheets, or a database), always include table borders in the export.
- Avoid scanned PDFs unless you run OCR first. A scanned PDF is essentially a picture of a page. No text extraction tool can read it directly because there is no text data to extract — only pixels. You need to run OCR (Optical Character Recognition) first to convert the image into machine-readable text. Adobe Acrobat Pro, Google Drive, and several free tools can perform OCR on scanned PDFs.
- Check column alignment after every conversion. The most common extraction error is column misalignment — where data from one column shifts into an adjacent column. This happens when the original PDF has inconsistent spacing or when a cell contains text that is wider than its column. Always scan the first few rows and the last few rows to verify alignment.
- Merge split cells manually. Multi-line content within a single cell often gets split into two separate rows during extraction. Look for rows that seem incomplete or that contain text fragments that belong to the row above. This is especially common with address fields, product descriptions, and any cell with line breaks.
- Convert numbers after extraction. Some tools extract numbers as text strings. In Excel, you will notice these "text numbers" left-align instead of right-aligning. Select the column, go to Data > Text to Columns, click Finish, and Excel will convert them to proper numbers. Or use the VALUE() function.
- Handle currency symbols and formatting separately. Dollar signs, euro signs, comma separators, and percentage symbols often interfere with numeric conversion. Use Find and Replace to strip these characters, convert to numbers, then reapply formatting in Excel.
Frequently Asked Questions
Does the conversion preserve Excel formulas?
No. PDFs do not store formulas — they only store the final calculated values. When a spreadsheet is saved as PDF, all formulas are replaced by their results. No conversion tool can recover the original formulas. You will get the raw numbers and will need to recreate any formulas you need in Excel.
What happens with merged cells?
Merged cells in the original table are a common source of extraction errors. Most tools will either place the merged cell's content in the first column and leave the remaining columns empty, or duplicate the content across all columns that the merge spanned. You will likely need to manually adjust merged cell areas after conversion.
Can I convert scanned PDFs to Excel?
Not directly. Scanned PDFs contain images, not text data. You need to run OCR first to convert the scanned pages into searchable text, and then extract the tables. Adobe Acrobat Pro can do both steps in one go. For free OCR, upload the scanned PDF to Google Drive — it automatically performs OCR when you open the file with Google Docs. Then export from there.
How does the tool handle multi-page tables?
In AllPDF.tools, each page is extracted as a separate sheet in the Excel workbook. If your table spans multiple pages, you will get one sheet per page and can combine them in Excel by copying rows from subsequent sheets into the first sheet. Some tools like Adobe Acrobat Pro attempt to merge multi-page tables automatically, though results vary depending on whether headers repeat and how consistent the column widths are across pages.
How accurate is the conversion?
Accuracy depends almost entirely on the source PDF. Clean, digitally-created PDFs with well-structured tables and visible grid lines will convert with 95%+ accuracy. PDFs with complex layouts, merged cells, or irregular spacing will have lower accuracy and require manual cleanup. Scanned PDFs without OCR will produce no usable output at all. Always review the converted spreadsheet before using the data for anything important — especially financial calculations or reporting.
Is it really free?
The AllPDF.tools converter is completely free with no limits on file size or number of conversions. There is no account required, no trial period, and no watermarks on the output. The tool runs entirely in your browser using JavaScript, so there are no server costs to pass on to users.
Is my data private?
Yes. AllPDF.tools processes everything locally in your browser. Your PDF file is never uploaded to any server. The conversion happens using JavaScript libraries (PDF.js for reading the PDF, and a spreadsheet library for writing the XLSX file) that run entirely on your device. Once you close the browser tab, the data is gone. This makes it safe to use with confidential financial documents, tax returns, bank statements, and other sensitive files.
Convert PDF to Excel Now