HTPBE.TECH: a free PDF metadata forensics tool — here's what it detects and where it fails

HTPBE.TECH: a free PDF metadata forensics tool — here's what it detects and where it fails
嗯，用户让我总结一下这篇文章的内容，控制在一百个字以内。首先，我需要通读文章，抓住主要信息。文章讲的是一个自动化的PDF修改检测工具，用于发票和文档欺诈检测。作者在过去一年里一直在开发这个工具，并且希望分享方法和获得反馈。接下来，文章详细介绍了几种检测方法：元数据层一致性、增量更新结构、创建者/生产者指纹、数字签名完整性以及字体结构异常。这些方法帮助判断PDF是否被修改过，结果分为完整、修改或不确定三种状态。然后，文章提到了置信水平分为确定和高两种，以及一些限制条件，比如不支持加密PDF和内容级伪造检测。最后，作者提供了一个免费的在线工具链接，并邀请专业人士提供反馈。现在，我需要把这些信息浓缩到100字以内。要突出工具的功能、主要检测方法、结果分类以及工具的可用性。同时，要避免使用“这篇文章”这样的开头。总结起来：作者开发了一个免费的在线工具，用于检测PDF修改痕迹，通过分析元数据、更新结构等方法判断文档是否被篡改，并分类为完整、修改或不确定状态。 </think> 作者开发了一款免费在线工具，用于检测PDF文档的篡改痕迹。通过分析元数据一致性、增量更新结构、创建者指纹等特征，判断文档是否被修改，并将其分类为完整、修改或不确定状态。该工具旨在帮助识别发票和文档欺诈行为，并提供反馈渠道以改进技术。 2026-3-19 15:29:15 Author: www.reddit.com(查看原文) 阅读量:7 收藏

Over the past year I've been working on automated PDF modification detection for invoice and document fraud use cases. The web tool is free and unlimited — wanted to share the methodology and get feedback from people who actually do this professionally.

Metadata layer consistency — Info dictionary vs XMP; mismatches are a common artifact of partial edits
Incremental update structure — xref table count, update chain length
Creator/Producer fingerprinting — ~50+ known tools flagged by name (iLovePDF, Smallpdf, Adobe Acrobat, Microsoft Word, etc.)
Digital signature integrity — specifically whether a signature was present and removed post-signing
Font structure anomalies — soft masks, vector outlines over image-heavy pages, isolated text layers over scanned backgrounds

Three states: intact / modified / inconclusive

Confidence levels:

certain — cryptographic or structural evidence; no false positives by design (signature removed, post-signature modification)
high — strong forensic evidence; rare false positives in linearized or batch-processed PDFs

Content-level forgeries with no structural trace (clean export from scratch)
PDFs processed through online editors (Smallpdf, iLovePDF, etc.) — original metadata stripped → returns inconclusive / online_editor_origin
Consumer software origin (Word, LibreOffice, Google Docs) → same inconclusive verdict; integrity check doesn't apply
Does not validate digital signature cryptographic chains — only detects presence/removal
Encrypted PDFs not supported

Tool: https://htpbe.tech — free web interface, no login required

Curious whether the inconclusive classification for online-editor-processed documents matches what you see in practice, and what other structural signals you'd prioritize.

文章来源: https://www.reddit.com/r/computerforensics/comments/1ry3j6d/htpbetech_a_free_pdf_metadata_forensics_tool/
如有侵权请联系:admin#unsafe.sh