Run a tagged PDF through veraPDF for the first time and one of two things happens. The report comes back green, you exhale, and you ship the file to the government portal. Or the report comes back with a wall of failures — 7.21.3.2-1, 7.1-2, 6.1.6-1 — and the whole exercise feels like reading air-traffic chatter. Both reactions are mistakes. The first because veraPDF passing is not the same as a PDF being accessible. The second because the rule numbers, once you understand the system, are the most useful diagnostic information in the entire PDF/UA ecosystem.
This piece is the version of veraPDF we wish someone had handed us the first time we opened it. What it is, where it came from, what its rule numbers mean, what its outputs cover, and the gap it leaves — the gap a human auditor exists to fill.
In this piece
01 · What veraPDF is, and why it has standing
veraPDF is an open-source validator that decides whether a PDF conforms to one of the formal ISO standards in the PDF family. PDF/A — the archival standard, in flavours 1, 2, 3 and 4. PDF/UA-1 — the universal accessibility standard, ISO 14289-1:2014. PDF/A-4 with accessibility (PDF/UA-2 is still maturing). The project was launched in 2014 with funding from the EU's PREFORMA programme and is now stewarded by the PDF Association and the Open Preservation Foundation. It is dual-licensed under GPL v3 and MPL v2 — free for any commercial use.
The reason veraPDF matters more than the dozen other PDF checkers in circulation is institutional. It is the reference implementation that EU national libraries use to verify archival deposits. It is what the German Federal Government's accessibility programmes rely on. The PDF Association's own conformance evaluation is built on top of it. When a government portal in any jurisdiction with PDF/UA requirements — and that increasingly includes India — asks how a PDF was tested, "veraPDF" is the single answer that requires no further explanation.
It is not, however, an accessibility judge. It is a conformance checker. The distinction is the entire point of this piece.
02 · Reading the rule numbers
veraPDF rule identifiers look intimidating because they are dense, but the format is consistent. Every rule ID for PDF/UA-1 follows the shape:
[ISO 14289-1 clause]-[rule index within clause] Examples: 7.1-1 — first rule under clause 7.1 (real content / artifacts) 7.21.3.2-1 — first rule under clause 7.21.3.2 (non-standard structure types) 6.1.6-1 — first rule under clause 6.1.6 (PDF major version)
The number before the dash maps to a specific clause in the ISO 14289-1 standard. If you have a copy of the standard open beside the report — and serious PDF remediation teams do — every rule failure points directly to the prose that defines it. The number after the dash is the rule index within that clause, allowing veraPDF to test several distinct conditions under one standard clause.
Read this way, the report stops being noise. 7.1-2 failing on every page of a document means real content is appearing inside an Artifact wrapper, which means the structure tree is broken, which usually means the file was tagged automatically and never reviewed. 7.3-1 failing forty-three times means forty-three figures lack alternate text. The rule number is the diagnosis.
03 · The rule categories, walked
ISO 14289-1 organises PDF/UA-1 requirements into a small number of broad clauses. veraPDF's rules cluster the same way. Knowing the categories is enough to read most reports without reaching for the standard.
| Clause | What it governs | What veraPDF checks |
|---|---|---|
| 5 | General requirements | Document is a tagged PDF. ViewerPreferences declared. DisplayDocTitle set so the document title (not the filename) shows in the window title bar. |
| 6 | File format conformance | PDF version compatibility, font embedding, encryption mode (must allow accessibility), absence of disallowed features like XFA forms. |
| 7.1 | Real content vs Artifacts | The fundamental dichotomy. Every page object is either real content (must be tagged in the structure tree) or an artifact (must be marked as such). Nothing in between. |
| 7.2 | Text | Text can be mapped to Unicode. Soft hyphens use the right character. No PUA (Private Use Area) characters without ActualText. Stretched glyphs handled. |
| 7.3 | Graphics | Figures have a tag of type Figure with Alt or ActualText. Decorative graphics marked as Artifact. Vector and raster handled the same way. |
| 7.4 | Headings | Heading tags follow a sensible hierarchy. The standard accepts either a strict H1–H6 model or a single H model — veraPDF tests for consistency within whichever was chosen. |
| 7.5 | Tables | Table tags contain proper TR rows, TH headers, TD cells. Header associations either via Scope or Headers/IDs. Layout tables masquerading as data tables fail here. |
| 7.6 | Lists | List tags contain LI items with Lbl labels and LBody bodies. Bulleted text rendered as paragraphs fails. |
| 7.7 | Mathematical expressions | Math content uses the Formula tag, with Alt providing the readable equivalent. |
| 7.8 | Headers and footers | Pagination, running headers, and repeated footers marked as Artifacts so screen readers don't read them on every page. |
| 7.9 | Notes and references | Footnotes and endnotes structured with the Note tag and proper linking. |
| 7.17 | Navigation | Documents of substantial length must have a document outline (bookmarks). |
| 7.18 | Annotations | Form fields and link annotations have a Contents entry. Tab order through annotations matches structural order. |
| 7.21 | Standard structure types | Custom tag names mapped to standard types via the RoleMap. This is where automated taggers produce the largest volume of failures. |
| 8 | Conformance declaration | The document declares PDF/UA conformance correctly in metadata. |
If you commit only one of these to memory, make it 7.1. The artifact/real-content split is the single concept underneath most veraPDF failures. Get it wrong at the tagging stage and every page produces noise. Get it right and most other rules tend to follow.
04 · The ten failures we see most often
Across the documents that pass through our PDF remediation pipeline, the same ten patterns account for roughly four out of every five veraPDF failures. They are listed in descending order of frequency.
Non-standard structure type without role mapping
The document uses a custom tag — Heading, Body, Caption2 — that is not part of the standard set, and the document's RoleMap does not map it to a recognised type. Every authoring tool with a quirky template produces this. Fix is mechanical: extend the RoleMap, or rename the tags. Common in PDFs exported from older versions of Microsoft Word and from custom InDesign templates.
Image without alternate text
A Figure tag exists, but its Alt entry is missing or empty. This is the most-cited PDF/UA failure in the industry. veraPDF flags the absence; what it cannot flag is the inverse — an Alt entry that says "image" or "photo1.jpg". For that you need a human. We see "graphic", "logo", "image" used as alt text in over a third of the PDFs we audit. All pass veraPDF. None are accessible.
Artifact present in real content
Decorative content — a background pattern, a watermark, a page-number ornament — is mixed inside a paragraph or heading tag instead of being marked as an Artifact. Screen readers will read it aloud. The user hears their document interrupted by "Star, star, star." Fix is to re-tag the offending content as Artifact and re-export.
Real content present in artifact
The mirror image of the previous failure. Meaningful text — a footnote, a sidebar — has been treated as an artifact and will be silenced for screen reader users. Particularly common in two-column government circulars where the sidebar gets lost during automated tagging.
Document does not display the document title
The PDF metadata has a title, but ViewerPreferences/DisplayDocTitle is not set to true, so most viewers show the filename instead. A one-line fix; an almost universal failure on government PDFs exported from Word without a post-processing step.
Metadata stream missing or malformed
PDF/UA-1 requires an embedded XMP metadata stream that declares conformance. veraPDF rejects documents where the metadata is missing, where the XMP is structurally invalid, or where the PDF/UA conformance namespace is absent. This is what happens when a PDF is post-processed by a tool that strips metadata.
Annotation without Contents key
Link annotations, comment annotations, and form field annotations all require a Contents entry — the text a screen reader will announce. Annotations created interactively in Acrobat often have one; those generated by export pipelines often do not.
Table without proper header association
Data tables must declare which cells are headers — using Scope on TH elements, or Headers/IDs for complex layouts. veraPDF will flag tables that lack any header association. It will not, however, flag layout tables that have been mis-tagged as data tables, or data tables where the header association is technically present but logically wrong. Those remain human-judgment items.
List structure incomplete
A list exists, but its items lack Lbl (label/bullet) and LBody (body text) child elements. Common pattern: text exports with bullet characters as part of the paragraph text, never tagged as a list at all. veraPDF cannot detect a list that doesn't claim to be one.
Font not embedded
A glyph references a font that is not embedded in the document. Fails accessibility because text replacement for assistive technology requires the font to be present. Trivial to fix at the authoring stage; expensive to fix afterwards because re-embedding may alter pagination.
05 · The Matterhorn Protocol bridge
The PDF Association publishes a document called the Matterhorn Protocol. It is the bridge between the abstract prose of ISO 14289-1 and a tester's working list. The Protocol decomposes PDF/UA-1 into 31 checkpoints and 136 failure conditions. Of those 136 conditions, the Protocol explicitly identifies how many can be tested by software and how many require human judgment.
The current ratio is roughly 87 machine-checkable, 49 human-only. veraPDF implements essentially all of the 87. The remaining 49 are why audit labs still exist.
The human-only items are not edge cases. They are the most consequential parts of PDF accessibility:
- Is the alternate text meaningful? "Photo of three children in a classroom" is accessible. "image.jpg" is not. veraPDF cannot tell.
- Does the reading order match the visual order? A two-column document with the structure tree linearised left-to-right reads correctly; one linearised column-first reads as gibberish. The structure can be valid in both cases.
- Are headings used to express hierarchy, or just for visual styling? A document with three H1s in a row is technically valid; logically it is not navigable.
- Are the Lbl values for list items semantically correct? "1.", "2.", "3." is meaningful; "a.", "b.", "c." in a numbered procedure is misleading.
- Are tables genuinely data tables, or layout tables mis-tagged? If a layout grid is tagged as a Table, screen reader users hear "Row 1, column 1" announcements on every paragraph.
- Does the Lang attribute reflect actual language switches? A bilingual government circular with English structure but Hindi body paragraphs marked as
lang="en"reads each Hindi word in a heavy American accent.
This is why every veraPDF report that comes out of our lab is paired with a human review. Passing the machine layer earns the file a clean conformance claim. Passing the human layer earns it accessibility.
The rule we apply internally
If a PDF passes veraPDF cleanly on first run, treat it with suspicion. It usually means the file was tagged programmatically by a tool that knows how to satisfy the machine layer without understanding the document. Real documents — written by humans, for humans — almost always need at least one structural correction before they are both conformant and accessible.
06 · Running veraPDF in practice
Three ways to run it, depending on where you sit in the pipeline.
The GUI for ad-hoc checks
Download the installer for Windows, macOS or Linux from verapdf.org. Open a PDF, choose the validation profile (PDF/UA-1 for accessibility), click Validate. The report opens in a tree view; expand a failure to see the page number and offending object. Adequate for spot checks and for understanding what a failure looks like. Inadequate for any pipeline.
The CLI for batch and CI
The command-line tool ships in the same package. Most production pipelines look like this:
# Validate a directory of PDFs against PDF/UA-1 $ verapdf -f ua1 --format json --recurse ./pdfs/ > report.json # Exit code 0 = all pass; 1 = at least one failure
This is what production accessibility teams wire into their CI. A PDF that fails veraPDF cannot reach the deployment branch. The JSON output is straightforward to feed into dashboards or compliance trackers.
The Java API for integration
For tools that need to call validation from inside their own code — PDF remediation engines, content management pipelines, our own PDF Engine at pdf.accesssure.in — veraPDF ships as a Java library. Call it, get the structured validation result, branch on it. We use it as the first quality gate after our auto-tagging stage and as the final gate before delivery.
07 · The limits, stated plainly
The shortest summary of what veraPDF cannot do:
- It cannot judge whether content makes sense to a human.
- It cannot detect reading-order failures that produce a structurally valid but semantically broken document.
- It cannot tell layout tables from data tables.
- It cannot verify that Alt text describes the image.
- It cannot test the accessibility of dynamic features that require execution — form behaviour, scripted actions, multimedia interactivity.
- It cannot certify a PDF for any specific jurisdiction's audit; that is a function of the auditor, not the validator.
None of this is a criticism. veraPDF does exactly what an automated validator can do, well, and stops where automation has to stop. The trouble is not the tool. It is the way teams treat the output. A green veraPDF report is a precondition for accessibility. It is not the destination.
Pass veraPDF and the human review, in one run.
Our PDF accessibility engine runs veraPDF as the first quality gate, then applies AI-assisted remediation for the human-judgment items — alt text, reading order, table structure, list semantics. Before-and-after evidence on every file. Bilingual content supported.
Try it on a PDF → Talk to the lab08 · Questions our clients ask
Does passing veraPDF mean my PDF is accessible?
What is the Matterhorn Protocol?
What does a rule ID like 7.21.3.2-1 mean?
Is veraPDF approved for STQC or GIGW 3.0 audits?
What is the difference between PDF/A and PDF/UA?
Can I remediate a failing PDF inside veraPDF?
Is veraPDF free?
A validator can tell you whether your PDF claims to be accessible. A human can tell you whether it actually is. Both are necessary. Neither is sufficient.