Over the past year I've been working on automated PDF modification detection for invoice and document fraud use cases. The web tool is free and unlimited — wanted to share the methodology and get feedback from people who actually do this professionally.
Metadata layer consistency — Info dictionary vs XMP; mismatches are a common artifact of partial edits
Incremental update structure — xref table count, update chain length
Creator/Producer fingerprinting — ~50+ known tools flagged by name (iLovePDF, Smallpdf, Adobe Acrobat, Microsoft Word, etc.)
Digital signature integrity — specifically whether a signature was present and removed post-signing
Font structure anomalies — soft masks, vector outlines over image-heavy pages, isolated text layers over scanned backgrounds
Three states: intact / modified / inconclusive
Confidence levels:
certain — cryptographic or structural evidence; no false positives by design (signature removed, post-signature modification)
high — strong forensic evidence; rare false positives in linearized or batch-processed PDFs
Content-level forgeries with no structural trace (clean export from scratch)
PDFs processed through online editors (Smallpdf, iLovePDF, etc.) — original metadata stripped → returns inconclusive / online_editor_origin
Consumer software origin (Word, LibreOffice, Google Docs) → same inconclusive verdict; integrity check doesn't apply
Does not validate digital signature cryptographic chains — only detects presence/removal
Encrypted PDFs not supported
Tool: https://htpbe.tech — free web interface, no login required
Curious whether the inconclusive classification for online-editor-processed documents matches what you see in practice, and what other structural signals you'd prioritize.