PDF files are everywhere โ in contracts, invoices, reports, scanned documents, and archives. But PDF corruption is far more common than most people realize. A corrupted PDF can look perfectly fine in your file manager while being completely unreadable, partially damaged, or structurally broken inside.
This guide covers everything you need to know: what causes PDF corruption, how to detect it, which tools actually work, and how to recover your files โ all updated for 2026.
PDF corruption occurs when the internal structure of a PDF file is damaged, incomplete, or inconsistent. The Portable Document Format (PDF) is a complex binary format created by Adobe. It contains many interdependent components:
The %PDF- signature at the start. Without it, no PDF reader will open the file.
An index of every object in the file. If this breaks, the reader can't find content.
A hierarchical structure that maps how pages connect. If broken, pages become inaccessible.
The actual text, images, and graphics for each page. Damaged streams cause visual corruption.
If encryption metadata is corrupted, even the correct password won't open the file.
The %%EOF at the end signals where the file ends. Truncated files miss this marker.
When any of these components is damaged, missing, or internally inconsistent, the result is a corrupted PDF โ which may range from "slightly broken" (still opens but with missing content) to "completely unreadable" (crashes every reader you try).
A corrupted PDF is NOT the same as an encrypted PDF, a password-locked PDF, or a PDF with usage restrictions. Encryption is a feature โ corruption is damage. Many tools wrongly conflate the two.
Understanding what causes corruption helps you prevent it. These are the most common culprits:
PDFs can be corrupted in ways that are immediately obvious โ or completely invisible to the naked eye. Here are the warning signs to watch for:
Do not assume a PDF is healthy just because it opens. Sophisticated PDF readers like Adobe Acrobat silently repair many issues when opening a file. The corruption still exists on disk โ it's just being masked at read time.
Open the same PDF in different readers: Adobe Acrobat, Foxit, Google Chrome (built-in), and Preview (macOS). If any reader fails where another succeeds, the file has compatibility issues at minimum. If all readers fail, it's likely corrupted.
Every valid PDF starts with %PDF- in the first 5 bytes. You can verify this in any hex editor or by opening the file in a text editor and checking the first line. If the file starts with PK (ZIP), %!PS (PostScript), or random bytes โ it is not a PDF.
$ head -c 8 myfile.pdf
%PDF-1.7
The free, open-source qpdf tool provides excellent PDF diagnostics:
$ qpdf --check myfile.pdf
myfile.pdf: is linearized
myfile.pdf: is not encrypted
myfile.pdf: file is damaged
myfile.pdf: ERROR: page 3: invalid object 15 0 (bad type)
If the output contains ERROR or WARNING lines, the file has structural problems that need attention.
For checking large numbers of PDFs at once, a dedicated batch scanner is far more efficient than opening each file manually. PDF Folder Health Scanner PRO can analyze hundreds of PDFs simultaneously and generate detailed forensic reports.
PDF Folder Health Scanner PRO runs a forensic analysis pipeline on every PDF you upload โ completely free, with no file size limit per batch beyond the per-file maximum.
The tool runs up to 7 analysis layers depending on the scan mode you choose:
| Layer | What It Checks | Safe | Quick | Deep |
|---|---|---|---|---|
| A โ Signature | %PDF- header, %%EOF marker, file size | โ | โ | โ |
| B โ Structure | Catalog, page tree, cross-reference table | โ | โ | โ |
| C โ Pages | Page count, page accessibility | โ | โ | โ |
| D โ Render | First page visual render test | โ | โ (1 page) | โ (all) |
| E โ Extraction | Text, metadata, fonts, bookmarks | โ | โ | โ |
| F โ Encryption | Encryption type, permissions, cipher | โ | โ | โ |
Unlike simpler tools, our scanner distinguishes between scan results and error sources. If a file fails due to a network upload error, that's clearly marked as a transport error โ not silently labeled as "Corrupted PDF".
The scanner correctly separates Password Locked and Encrypted PDFs from corrupted ones. A locked PDF is not damaged โ it just needs a password. Many older tools incorrectly flag these as corrupted.
Each scanned PDF receives a Health Score from 0 to 100 and a Repair Probability percentage. Here's how to interpret them:
This score estimates the chance that a repair tool can recover the file. It is only calculated from checks that actually ran โ not guessed. The confidence level tells you how much data backed up the estimate:
A Safe or Quick mode scan has fewer checks than a Deep scan. The health score reflects only what was actually tested โ not the full picture. Use Deep scan for files you need to archive or certify as intact.
Visit PDF Folder Health Scanner PRO. The tool runs in your browser and communicates with a private server โ no account needed.
Look at the status indicator in the top bar. It should show a green dot and "Online". If offline, the scan cannot run.
Select Safe (quick triage), Quick (balanced), or Deep (full forensic) depending on how thorough you need to be.
Drag your folder into the drop zone, or click "Select Folder". You can also pick individual PDFs. Files with the same name in different subfolders are correctly tracked as separate items.
Files are uploaded to the server in batches. A progress bar shows you how many files have been analyzed and which file is currently being processed.
Results appear in real time as each batch completes. Click any file in the left list or table to see full forensic details in the right diagnostics panel.
Download a CSV for use in Excel, a JSON file for developers, or a full PDF forensic report for documentation purposes.
Once you've identified corrupted files, you have several repair options depending on severity:
Best for: cross-reference table errors, structure problems, linearization issues.
# Rebuild cross-reference table and save as new file
qpdf --linearize input.pdf output.pdf
# Check what's wrong first
qpdf --check input.pdf
# Force repair even if errors detected
qpdf --qdf input.pdf repaired.pdf
Best for: splitting a partially damaged file, recovering individual pages.
# Try to recover the file by writing it out again
pdftk input.pdf output repaired.pdf
# Extract just the readable pages
pdftk input.pdf burst output page_%04d.pdf
Best for: complex corruption, image recovery, security permission repairs. File โ Save As โ PDF (forces a full rewrite which often fixes minor corruption). Acrobat's "Preflight" tool provides even deeper analysis.
If repair tools fail, restore from your most recent backup. This is always the most reliable solution. This is also why regular backups are not optional for important document archives.
Several online services specialize in PDF repair, including PDF2Go, IlovePDF, and Sejda. Be cautious about uploading confidential documents to online services โ read their privacy policies carefully.
| Tool | Batch Scan | Health Score | Encryption Detection | Free | Browser-Based |
|---|---|---|---|---|---|
| PDF Folder Health Scanner PRO | โ 100+ | โ 0โ100 | โ Accurate | โ | โ |
| qpdf (CLI) | Manual loop | No | Basic | โ | No |
| Adobe Acrobat Pro | Preflight | Partial | โ | Paid | No |
| PDF24 Tools | No | No | No | โ | โ |
| pdfinfo (poppler) | Manual loop | No | Basic | โ | No |
| IlovePDF | Limited | No | Partial | Freemium | โ |
Use the 3-2-1 rule: 3 copies, 2 different media types, 1 offsite. Test restores regularly.
Run a quick integrity check on any PDF you download, especially from external sources.
An uninterruptible power supply prevents corruption from sudden power failures during file writes.
Run a batch scan of your PDF archives once a year. Silent corruption accumulates over time.
For long-term preservation, use PDF/A format, which is ISO-standardized and self-contained.
Avoid editing a PDF on multiple devices simultaneously. Sync conflicts can silently corrupt files.
PDF Folder Health Scanner PRO is completely free. No account required. Your files are deleted from the server immediately after scanning.
๐ฌ Start Free Scan