HTPBE.TECH: a free PDF metadata forensics tool — here's what it detects and where it fails
嗯,用户让我总结一下这篇文章的内容,控制在一百个字以内。首先,我需要通读文章,抓住主要信息。文章讲的是一个自动化的PDF修改检测工具,用于发票和文档欺诈检测。作者在过去一年里一直在开发这个工具,并且希望分享方法和获得反馈。 接下来,文章详细介绍了几种检测方法:元数据层一致性、增量更新结构、创建者/生产者指纹、数字签名完整性以及字体结构异常。这些方法帮助判断PDF是否被修改过,结果分为完整、修改或不确定三种状态。 然后,文章提到了置信水平分为确定和高两种,以及一些限制条件,比如不支持加密PDF和内容级伪造检测。最后,作者提供了一个免费的在线工具链接,并邀请专业人士提供反馈。 现在,我需要把这些信息浓缩到100字以内。要突出工具的功能、主要检测方法、结果分类以及工具的可用性。同时,要避免使用“这篇文章”这样的开头。 总结起来:作者开发了一个免费的在线工具,用于检测PDF修改痕迹,通过分析元数据、更新结构等方法判断文档是否被篡改,并分类为完整、修改或不确定状态。 </think> 作者开发了一款免费在线工具,用于检测PDF文档的篡改痕迹。通过分析元数据一致性、增量更新结构、创建者指纹等特征,判断文档是否被修改,并将其分类为完整、修改或不确定状态。该工具旨在帮助识别发票和文档欺诈行为,并提供反馈渠道以改进技术。 2026-3-19 15:29:15 Author: www.reddit.com(查看原文) 阅读量:7 收藏

Over the past year I've been working on automated PDF modification detection for invoice and document fraud use cases. The web tool is free and unlimited — wanted to share the methodology and get feedback from people who actually do this professionally.

  • Metadata layer consistency — Info dictionary vs XMP; mismatches are a common artifact of partial edits

  • Incremental update structure — xref table count, update chain length

  • Creator/Producer fingerprinting — ~50+ known tools flagged by name (iLovePDF, Smallpdf, Adobe Acrobat, Microsoft Word, etc.)

  • Digital signature integrity — specifically whether a signature was present and removed post-signing

  • Font structure anomalies — soft masks, vector outlines over image-heavy pages, isolated text layers over scanned backgrounds

Three states: intact / modified / inconclusive

Confidence levels:

  • certain — cryptographic or structural evidence; no false positives by design (signature removed, post-signature modification)

  • high — strong forensic evidence; rare false positives in linearized or batch-processed PDFs

  • Content-level forgeries with no structural trace (clean export from scratch)

  • PDFs processed through online editors (Smallpdf, iLovePDF, etc.) — original metadata stripped → returns inconclusive / online_editor_origin

  • Consumer software origin (Word, LibreOffice, Google Docs) → same inconclusive verdict; integrity check doesn't apply

  • Does not validate digital signature cryptographic chains — only detects presence/removal

  • Encrypted PDFs not supported

Tool: https://htpbe.tech — free web interface, no login required

Curious whether the inconclusive classification for online-editor-processed documents matches what you see in practice, and what other structural signals you'd prioritize.


文章来源: https://www.reddit.com/r/computerforensics/comments/1ry3j6d/htpbetech_a_free_pdf_metadata_forensics_tool/
如有侵权请联系:admin#unsafe.sh