ITQCR · STQC SAB SETL-1 Empanelled Lab
Home/Learn/veraPDF Rules
PDF accessibility · Tooling

veraPDF, decoded — what the validator actually tells you.

It is the closest thing the PDF world has to a referee. It is also routinely misread. This is what veraPDF can decide, what it cannot, and how to use a report without being lulled by a green checkmark.

Published 19 May 2026 Reviewed by ITQCR PDF remediation team Read time 13 min Standard PDF/UA-1 (ISO 14289-1:2014)

Run a tagged PDF through veraPDF for the first time and one of two things happens. The report comes back green, you exhale, and you ship the file to the government portal. Or the report comes back with a wall of failures — 7.21.3.2-1, 7.1-2, 6.1.6-1 — and the whole exercise feels like reading air-traffic chatter. Both reactions are mistakes. The first because veraPDF passing is not the same as a PDF being accessible. The second because the rule numbers, once you understand the system, are the most useful diagnostic information in the entire PDF/UA ecosystem.

This piece is the version of veraPDF we wish someone had handed us the first time we opened it. What it is, where it came from, what its rule numbers mean, what its outputs cover, and the gap it leaves — the gap a human auditor exists to fill.

01 · What veraPDF is, and why it has standing

veraPDF is an open-source validator that decides whether a PDF conforms to one of the formal ISO standards in the PDF family. PDF/A — the archival standard, in flavours 1, 2, 3 and 4. PDF/UA-1 — the universal accessibility standard, ISO 14289-1:2014. PDF/A-4 with accessibility (PDF/UA-2 is still maturing). The project was launched in 2014 with funding from the EU's PREFORMA programme and is now stewarded by the PDF Association and the Open Preservation Foundation. It is dual-licensed under GPL v3 and MPL v2 — free for any commercial use.

The reason veraPDF matters more than the dozen other PDF checkers in circulation is institutional. It is the reference implementation that EU national libraries use to verify archival deposits. It is what the German Federal Government's accessibility programmes rely on. The PDF Association's own conformance evaluation is built on top of it. When a government portal in any jurisdiction with PDF/UA requirements — and that increasingly includes India — asks how a PDF was tested, "veraPDF" is the single answer that requires no further explanation.

It is not, however, an accessibility judge. It is a conformance checker. The distinction is the entire point of this piece.

02 · Reading the rule numbers

veraPDF rule identifiers look intimidating because they are dense, but the format is consistent. Every rule ID for PDF/UA-1 follows the shape:

[ISO 14289-1 clause]-[rule index within clause]

Examples:
7.1-1     — first rule under clause 7.1 (real content / artifacts)
7.21.3.2-1 — first rule under clause 7.21.3.2 (non-standard structure types)
6.1.6-1   — first rule under clause 6.1.6 (PDF major version)

The number before the dash maps to a specific clause in the ISO 14289-1 standard. If you have a copy of the standard open beside the report — and serious PDF remediation teams do — every rule failure points directly to the prose that defines it. The number after the dash is the rule index within that clause, allowing veraPDF to test several distinct conditions under one standard clause.

Read this way, the report stops being noise. 7.1-2 failing on every page of a document means real content is appearing inside an Artifact wrapper, which means the structure tree is broken, which usually means the file was tagged automatically and never reviewed. 7.3-1 failing forty-three times means forty-three figures lack alternate text. The rule number is the diagnosis.

03 · The rule categories, walked

ISO 14289-1 organises PDF/UA-1 requirements into a small number of broad clauses. veraPDF's rules cluster the same way. Knowing the categories is enough to read most reports without reaching for the standard.

ClauseWhat it governsWhat veraPDF checks
5 General requirements Document is a tagged PDF. ViewerPreferences declared. DisplayDocTitle set so the document title (not the filename) shows in the window title bar.
6 File format conformance PDF version compatibility, font embedding, encryption mode (must allow accessibility), absence of disallowed features like XFA forms.
7.1 Real content vs Artifacts The fundamental dichotomy. Every page object is either real content (must be tagged in the structure tree) or an artifact (must be marked as such). Nothing in between.
7.2 Text Text can be mapped to Unicode. Soft hyphens use the right character. No PUA (Private Use Area) characters without ActualText. Stretched glyphs handled.
7.3 Graphics Figures have a tag of type Figure with Alt or ActualText. Decorative graphics marked as Artifact. Vector and raster handled the same way.
7.4 Headings Heading tags follow a sensible hierarchy. The standard accepts either a strict H1–H6 model or a single H model — veraPDF tests for consistency within whichever was chosen.
7.5 Tables Table tags contain proper TR rows, TH headers, TD cells. Header associations either via Scope or Headers/IDs. Layout tables masquerading as data tables fail here.
7.6 Lists List tags contain LI items with Lbl labels and LBody bodies. Bulleted text rendered as paragraphs fails.
7.7 Mathematical expressions Math content uses the Formula tag, with Alt providing the readable equivalent.
7.8 Headers and footers Pagination, running headers, and repeated footers marked as Artifacts so screen readers don't read them on every page.
7.9 Notes and references Footnotes and endnotes structured with the Note tag and proper linking.
7.17 Navigation Documents of substantial length must have a document outline (bookmarks).
7.18 Annotations Form fields and link annotations have a Contents entry. Tab order through annotations matches structural order.
7.21 Standard structure types Custom tag names mapped to standard types via the RoleMap. This is where automated taggers produce the largest volume of failures.
8 Conformance declaration The document declares PDF/UA conformance correctly in metadata.

If you commit only one of these to memory, make it 7.1. The artifact/real-content split is the single concept underneath most veraPDF failures. Get it wrong at the tagging stage and every page produces noise. Get it right and most other rules tend to follow.

04 · The ten failures we see most often

Across the documents that pass through our PDF remediation pipeline, the same ten patterns account for roughly four out of every five veraPDF failures. They are listed in descending order of frequency.

7.21.3.2 / 7.21.4.2

Non-standard structure type without role mapping

The document uses a custom tag — Heading, Body, Caption2 — that is not part of the standard set, and the document's RoleMap does not map it to a recognised type. Every authoring tool with a quirky template produces this. Fix is mechanical: extend the RoleMap, or rename the tags. Common in PDFs exported from older versions of Microsoft Word and from custom InDesign templates.

7.3-1

Image without alternate text

A Figure tag exists, but its Alt entry is missing or empty. This is the most-cited PDF/UA failure in the industry. veraPDF flags the absence; what it cannot flag is the inverse — an Alt entry that says "image" or "photo1.jpg". For that you need a human. We see "graphic", "logo", "image" used as alt text in over a third of the PDFs we audit. All pass veraPDF. None are accessible.

7.1-2

Artifact present in real content

Decorative content — a background pattern, a watermark, a page-number ornament — is mixed inside a paragraph or heading tag instead of being marked as an Artifact. Screen readers will read it aloud. The user hears their document interrupted by "Star, star, star." Fix is to re-tag the offending content as Artifact and re-export.

7.1-1

Real content present in artifact

The mirror image of the previous failure. Meaningful text — a footnote, a sidebar — has been treated as an artifact and will be silenced for screen reader users. Particularly common in two-column government circulars where the sidebar gets lost during automated tagging.

6.7.3-1

Document does not display the document title

The PDF metadata has a title, but ViewerPreferences/DisplayDocTitle is not set to true, so most viewers show the filename instead. A one-line fix; an almost universal failure on government PDFs exported from Word without a post-processing step.

5.2

Metadata stream missing or malformed

PDF/UA-1 requires an embedded XMP metadata stream that declares conformance. veraPDF rejects documents where the metadata is missing, where the XMP is structurally invalid, or where the PDF/UA conformance namespace is absent. This is what happens when a PDF is post-processed by a tool that strips metadata.

7.18.1-1

Annotation without Contents key

Link annotations, comment annotations, and form field annotations all require a Contents entry — the text a screen reader will announce. Annotations created interactively in Acrobat often have one; those generated by export pipelines often do not.

7.5.1

Table without proper header association

Data tables must declare which cells are headers — using Scope on TH elements, or Headers/IDs for complex layouts. veraPDF will flag tables that lack any header association. It will not, however, flag layout tables that have been mis-tagged as data tables, or data tables where the header association is technically present but logically wrong. Those remain human-judgment items.

7.6

List structure incomplete

A list exists, but its items lack Lbl (label/bullet) and LBody (body text) child elements. Common pattern: text exports with bullet characters as part of the paragraph text, never tagged as a list at all. veraPDF cannot detect a list that doesn't claim to be one.

6.1.8

Font not embedded

A glyph references a font that is not embedded in the document. Fails accessibility because text replacement for assistive technology requires the font to be present. Trivial to fix at the authoring stage; expensive to fix afterwards because re-embedding may alter pagination.

05 · The Matterhorn Protocol bridge

The PDF Association publishes a document called the Matterhorn Protocol. It is the bridge between the abstract prose of ISO 14289-1 and a tester's working list. The Protocol decomposes PDF/UA-1 into 31 checkpoints and 136 failure conditions. Of those 136 conditions, the Protocol explicitly identifies how many can be tested by software and how many require human judgment.

The current ratio is roughly 87 machine-checkable, 49 human-only. veraPDF implements essentially all of the 87. The remaining 49 are why audit labs still exist.

The human-only items are not edge cases. They are the most consequential parts of PDF accessibility:

This is why every veraPDF report that comes out of our lab is paired with a human review. Passing the machine layer earns the file a clean conformance claim. Passing the human layer earns it accessibility.

The rule we apply internally

If a PDF passes veraPDF cleanly on first run, treat it with suspicion. It usually means the file was tagged programmatically by a tool that knows how to satisfy the machine layer without understanding the document. Real documents — written by humans, for humans — almost always need at least one structural correction before they are both conformant and accessible.

06 · Running veraPDF in practice

Three ways to run it, depending on where you sit in the pipeline.

The GUI for ad-hoc checks

Download the installer for Windows, macOS or Linux from verapdf.org. Open a PDF, choose the validation profile (PDF/UA-1 for accessibility), click Validate. The report opens in a tree view; expand a failure to see the page number and offending object. Adequate for spot checks and for understanding what a failure looks like. Inadequate for any pipeline.

The CLI for batch and CI

The command-line tool ships in the same package. Most production pipelines look like this:

# Validate a directory of PDFs against PDF/UA-1
$ verapdf -f ua1 --format json --recurse ./pdfs/ > report.json

# Exit code 0 = all pass; 1 = at least one failure

This is what production accessibility teams wire into their CI. A PDF that fails veraPDF cannot reach the deployment branch. The JSON output is straightforward to feed into dashboards or compliance trackers.

The Java API for integration

For tools that need to call validation from inside their own code — PDF remediation engines, content management pipelines, our own PDF Engine at pdf.accesssure.in — veraPDF ships as a Java library. Call it, get the structured validation result, branch on it. We use it as the first quality gate after our auto-tagging stage and as the final gate before delivery.

07 · The limits, stated plainly

The shortest summary of what veraPDF cannot do:

None of this is a criticism. veraPDF does exactly what an automated validator can do, well, and stops where automation has to stop. The trouble is not the tool. It is the way teams treat the output. A green veraPDF report is a precondition for accessibility. It is not the destination.

Pass veraPDF and the human review, in one run.

Our PDF accessibility engine runs veraPDF as the first quality gate, then applies AI-assisted remediation for the human-judgment items — alt text, reading order, table structure, list semantics. Before-and-after evidence on every file. Bilingual content supported.

Try it on a PDF Talk to the lab

08 · Questions our clients ask

Does passing veraPDF mean my PDF is accessible?
No. veraPDF tests the 87 machine-checkable conditions of the Matterhorn Protocol. The remaining 49 — meaningful alt text, correct reading order, sensible heading hierarchy, language switches, table semantics — require a human reviewer. A PDF can pass veraPDF cleanly and still be unusable with a screen reader.
What is the Matterhorn Protocol?
A PDF Association document that translates the abstract ISO 14289-1 requirements into 31 checkpoints and 136 specific failure conditions a tester can verify. It is the standard reference for splitting machine-testable items from human-testable ones.
What does a rule ID like 7.21.3.2-1 mean?
The number before the dash maps to a clause in ISO 14289-1 — here, 7.21.3.2, which governs role mapping for non-standard structure types. The number after the dash is the rule index within that clause. The format makes every failure traceable to the prose of the standard.
Is veraPDF approved for STQC or GIGW 3.0 audits?
veraPDF is widely accepted as evidence of PDF/UA-1 conformance, including in STQC review contexts. But conformance evidence is one component of a GIGW 3.0 audit; STQC reviewers also expect manual verification of accessibility, particularly for bilingual content and complex documents. ITQCR uses veraPDF as the first gate, then applies manual review on top.
What is the difference between PDF/A and PDF/UA?
PDF/A is the archival standard — it ensures the document will render the same way in fifty years. PDF/UA is the accessibility standard — it ensures the document is usable by assistive technology. The two can coexist: a PDF can claim both PDF/A-3a and PDF/UA-1 conformance, which is the target for most government archival systems.
Can I remediate a failing PDF inside veraPDF?
No. veraPDF only validates; it does not fix. For remediation you need Adobe Acrobat Pro, a dedicated remediation tool, or an automated engine. Our PDF Engine at pdf.accesssure.in reads veraPDF failures directly and applies fixes, then re-validates.
Is veraPDF free?
Yes. It is dual-licensed under GPL v3 and MPL v2 — free for commercial use, modification, and redistribution. Funded by EU programmes and stewarded by the PDF Association and the Open Preservation Foundation.

A validator can tell you whether your PDF claims to be accessible. A human can tell you whether it actually is. Both are necessary. Neither is sufficient.