Extract hashes, PE/ELF/Mach-O metadata, strings, YARA hits, and deep static analysis — without ever running the file.
Press enter or click to view image in full size
Introdaction
Whether you’re triaging a suspicious attachment, building a file-intel pipeline, or comparing your analysis to threat feeds, you need one place to get hashes, format-specific metadata, and optional deep static analysis — without decompiling or executing code.
Get Andrey Pautov’s stories in your inbox
Join Medium for free to get updates from this writer.
Basic File Information Gathering Script is a Python CLI that does exactly that. It’s built for malware analysts, digital forensics, and SOC engineers who want fast, scriptable file intelligence in table, JSON, or CSV form.
Table of contents
- Why another file-info tool?
- Two interfaces, one codebase
- Installation
- Quick start
- Hashes and fuzzy hashing
- Strings and YARA
- Full static analysis ( — full)
- Tuning format-specific analysis
- Real malware: MalwareBazaar integration
- Summary
Why another file-info tool?
file and md5sum tell you type and one hash. Full-blown sandboxes and disassemblers are heavy and often overkill for “what is this file?” and “how does it compare to VirusTotal/MalwareBazaar?”. This tool sits in the middle:
- Single pass for MD5, SHA-1, SHA-256, SHA-384, SHA-512 (and optional ssdeep/tlsh).
- Format-aware: PE (Windows), ELF (Linux), Mach-O (macOS) with meaningful fields — timestamps, imphash, entry point, packing heuristics, digital signatures, Rich header, overlay.
- 60+ magic numbers so you get a real file type, not just “data”.
- Strings (ASCII + UTF-16 LE), optional YARA scanning, and a full static analysis mode that gives you byte stats, entropy maps, head/tail hex, and pattern extraction (URLs, IPs, paths, registry keys) — no decompilation.
You can run it on one file, a list of files, or recursively over a directory, and get human-readable tables, JSON, or CSV for automation.
Two interfaces, one codebase
The rest of this article focuses on fileinfo.py.
fileinfo.py vs Basic_inf_gathering.py
When to use which
- Basic_inf_gathering.py — Quick, single-file PE report to the terminal; minimal dependencies (LIEF, optional
cryptography).
Press enter or click to view image in full size
- fileinfo.py — Default choice for batch, automation, JSON/CSV, YARA, strings, full static analysis, and non-PE (ELF/Mach-O).
Installation
git clone https://github.com/anpa1200/Basic-File-Information-Gathering-Script.git
cd Basic-File-Information-Gathering-Script
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txtOptional but recommended for malware work:
pip install ssdeep py-tlsh yara-python
# For PE certificate details:
pip install cryptography
# For OLE/compound doc listing in --full:
pip install olefileQuick start
Single file (human-readable table):
python3 fileinfo.py /path/to/sample.exePress enter or click to view image in full size
Recursive directory (e.g. a drop folder):
python3 fileinfo.py -r /path/to/samples/Press enter or click to view image in full size
JSON for automation or SIEM:
Press enter or click to view image in full size
python3 fileinfo.py --json /path/to/file.exe -o report.json GNU nano 7.2 report.json
{
"file_name": "malware.exe",
"file_path": "/home/andrey/git_project/Basic-File-Information-Gathering-Script/malware_samples/malware.exe",
"file_size": 563311,
"file_size_human": "563311 bytes (0.54 MB)",
"magic_number": "4D5A9000",
"file_type": "Windows Executable (Extended MZ)",
"entropy": 7.2249,
"entropy_note": "Normal",
"permissions": "-rw-rw-r--",
"hashes": {
"md5": "0b375e6b7e44d7c8488c4227e9344197",
"sha1": "dd8753066efc055dea693f44627fd69c988dfc65",
"sha256": "9fdea40a9872a77335ae3b733a50f4d1e9f8eff193ae84e36fb7e5802c481f72"
},
"pe": {
"timestamp": "2019-10-28 09:44:53 UTC (OK)",
"compiler": "Unknown",
"imphash": "3313409012dcc6b8a34048226776435e",
"header_offset": "264 (0x108)",
"entry_point": "RVA 0xEAF2, VA 0x40EAF2",
"rich_header": "Present (parse error)",
"resources": "147 resource nodes",
"overlay": "137327 bytes (0x2186F)",
"signature": "Not signed",
"packing": "Unpacked"
}
}CSV for spreadsheets or bulk comparison:
python3 fileinfo.py --csv -r ./malware_samples/ -o summary.csvYou immediately get: file name/path, size, magic-based file type, entropy, permissions, and for PE/ELF/Mach-O — timestamp, compiler/language hints, imphash (PE), entry point, Rich header (PE), resources, overlay, digital signature, and packing heuristic.
Hashes and fuzzy hashing
Default hashes are MD5, SHA-1, and SHA-256 (single read pass). You can add SHA-384/SHA-512 and control which hashes are computed:
python3 fileinfo.py --hashes md5,sha1,sha256,sha512 /path/to/fileWith ssdeep and py-tlsh installed, you also get ssdeep and tlsh hashes unless you pass --no-fuzzy. These are invaluable for clustering and “similar file” lookups (e.g. MalwareBazaar, VirusTotal).
Strings and YARA
Strings (ASCII and UTF-16 LE) with configurable minimum length:
python3 fileinfo.py --strings --min-str-len 8 sample.exePress enter or click to view image in full size
YARA (when yara-python and a rules file are available):
python3 fileinfo.py --yara /path/to/rules.yar sample.exeMatches appear in the report so you can quickly see which rules fired.
Press enter or click to view image in full size
Full static analysis (--full): maximum metadata, no decompilation
The --full flag runs an extra layer of static analysis: no execution, no decompilation. It adds:
- Byte-level stats: null ratio, printable ratio, byte frequency, longest null run.
- Entropy map: per-block entropy so you can spot packed or encrypted regions.
- Head/tail hex dump: first and last bytes for structure inspection.
- String patterns: URLs, IPv4, emails, Windows/Unix paths, registry keys (from raw bytes, including UTF-16 LE).
- PE deep: machine type, subsystem, DLL characteristics (ASLR, DEP, etc.), section table (name, size, entropy), full import/export lists, exphash, relocations, TLS callbacks, delay imports, Rich header, resource types, version info (FileVersion, CompanyName, etc.).
- ELF deep: class, machine, sections/segments, dynamic (NEEDED, RPATH, RUNPATH), exported/imported symbols, notes.
- Mach-O deep: CPU type, file type, dylibs, segments, UUID.
- Containers: ZIP file listing (names, sizes); OLE stream listing (if
olefileis installed).
Example:
python3 fileinfo.py --full sample.exe
python3 fileinfo.py --full --json sample.exe -o full_report.jsonThis is the mode you want when building a reproducible static report to compare with MalwareBazaar/VirusTotal or to feed into your own pipelines.
{
"file_name": "malware.exe",
"file_path": "/home/andrey/git_project/Basic-File-Information-Gathering-Script/malware_samples/malware.exe",
"file_size": 563311,
"file_size_human": "563311 bytes (0.54 MB)",
"magic_number": "4D5A9000",
"file_type": "Windows Executable (Extended MZ)",
"entropy": 7.2249,
"entropy_note": "Normal",
"permissions": "-rw-rw-r--",
"hashes": {
"md5": "0b375e6b7e44d7c8488c4227e9344197",
"sha1": "dd8753066efc055dea693f44627fd69c988dfc65",
"sha256": "9fdea40a9872a77335ae3b733a50f4d1e9f8eff193ae84e36fb7e5802c481f72"
},
"pe": {
"timestamp": "2019-10-28 09:44:53 UTC (OK)",
"compiler": "Unknown",
"imphash": "3313409012dcc6b8a34048226776435e",
"header_offset": "264 (0x108)",
"entry_point": "RVA 0xEAF2, VA 0x40EAF2",
"rich_header": "Present (parse error)",
"resources": "147 resource nodes",
"overlay": "137327 bytes (0x2186F)",
"signature": "Not signed",
"packing": "Unpacked"
},
"static_analysis": {
"byte_stats": {
"size_analyzed": 563311,
"null_ratio": 0.1027,
"printable_ratio": 0.4999,
"longest_null_run": 3424,
"top_byte_frequencies": [
"0xFF(16177)",
"0x8B(9498)",
"0x74(8902)",
"0x75(8338)",
"0x65(6918)",
"0x33(6268)",
"0x6A(6248)",
"0x6E(5795)",
"0x64(5711)",
"0x73(5700)"
]
},
"entropy_blocks": {
"block_size": 65536,
"num_blocks": 9,
"entropy_per_block": [
6.369,
6.62,
5.982,
7.53,
7.997,
7.993,
5.43,
5.044,
5.044
],
"high_entropy_blocks": [
{
"block": 3,
"offset": 196608,
"entropy": 7.53
},
{
"block": 4,
"offset": 262144,
"entropy": 7.997
},
{
"block": 5,
"offset": 327680,
"entropy": 7.993
}
],
"overall_avg_entropy": 6.445
},
"head_tail": {
"head_hex": "0000 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 MZ..............\n0010 B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 ........@.......\n0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................\n0030 00 00 00 00 00 00 00 00 00 00 00 00 08 01 00 00 ................\n0040 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68 ........!..L.!Th\n0050 69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F is program canno\n0060 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20 t be run in DOS \n0070 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00 mode....$.......\n0080 46 CA 45 A4 02 AB 2B F7 02 AB 2B F7 02 AB 2B F7 F.E...+...+...+.\n0090 81 A3 74 F7 08 AB 2B F7 F8 88 32 F7 04 AB 2B F7 ..t...+...2...+.\n00A0 11 A3 76 F7 00 AB 2B F7 81 A3 76 F7 13 AB 2B F7 ..v...+...v...+.\n00B0 02 AB 2A F7 31 A9 2B F7 07 A7 24 F7 19 AB 2B F7 ..*.1.+...$...+.\n00C0 07 A7 74 F7 8A AB 2B F7 29 8A 0C F7 0B AB 2B F7 ..t...+.).....+.\n00D0 07 A7 4B F7 75 AB 2B F7 07 A7 77 F7 03 AB 2B F7 ..K.u.+...w...+.\n00E0 EE A0 75 F7 03 AB 2B F7 07 A7 71 F7 03 AB 2B F7 ..u...+...q...+.\n00F0 52 69 63 68 02 AB 2B F7 00 00 00 00 00 00 00 00 Rich..+.........",
"tail_hex": "0000 79 61 76 65 6A 79 73 76 62 75 68 79 7A 69 37 74 yavejysvbuhyzi7t\n0010 6B 68 63 78 68 6F 61 72 6F 6E 38 62 6A 7A 66 33 khcxhoaron8bjzf3\n0020 61 6E 6F 6F 38 69 34 78 61 71 78 73 70 35 63 78 anoo8i4xaqxsp5cx\n0030 64 6B 72 71 71 64 37 61 30 62 64 6B 68 6A 66 77 dkrqqd7a0bdkhjfw\n0040 62 66 6B 68 63 75 77 6D 32 76 32 62 71 65 35 37 bfkhcuwm2v2bqe57\n0050 6D 34 36 72 78 6B 66 65 6B 71 32 74 7A 63 6F 32 m46rxkfekq2tzco2\n0060 69 30 78 30 64 33 63 65 61 7A 30 38 70 64 66 63 i0x0d3ceaz08pdfc\n0070 34 66 65 32 6D 33 6E 69 7A 68 7A 66 70 73 34 27 4fe2m3nizhzfps4'"
},
"string_patterns": {
"urls": [],
"ipv4": [],
"emails": [],
"win_paths": [],
"unix_paths": [
"/atexit",
"/0123456789",
"/dd/yy",
"//rZy",
"/YAZj",
"/1VVg",
"/IKQr"
],
"registry": []
},
"pe_deep": {
"machine": "i386",
"number_of_sections": 4,
"timestamp": 1572255893,
"timestamp_utc": "2019-10-28 09:44:53+00:00",
"subsystem": "SUBSYSTEM.WINDOWS_GUI",
"dll_characteristics": 0,
"dll_characteristics_list": [],
"imagebase": "0x400000",
"entry_point_rva": "0xeaf2",
"section_alignment": 4096,
"file_alignment": 4096,
"size_of_image": 442368,
"checksum": 487412,
"data_directories_used": [],
"sections": [
{
"name": ".text",
"virtual_size": 161054,
"size": 163840,
"offset": 4096,
"entropy": 6.609,
"characteristics": "0x60000020"
},
{
"name": ".rdata",
"virtual_size": 43265,
"size": 45056,
"offset": 167936,
"entropy": 5.047,
"characteristics": "0x40000040"
},
{
"name": ".data",
"virtual_size": 201908,
"size": 188416,
"offset": 212992,
"entropy": 7.949,
"characteristics": "0xc0000040"
},
{
"name": ".rsrc",
"virtual_size": 23952,
"size": 24576,
"offset": 401408,
"entropy": 4.197,
"characteristics": "0x40000040"
}
],
"imports": [
{
"dll": "KERNEL32.dll",
"apis": [
"ExitProcess",
"TerminateProcess",
"HeapReAlloc",
"HeapSize",
"HeapDestroy",
"HeapCreate",
"VirtualFree",
"IsBadWritePtr",
"GetStdHandle",
"UnhandledExceptionFilter",
"FreeEnvironmentStringsA",
"GetEnvironmentStrings",
"FreeEnvironmentStringsW",
"GetEnvironmentStringsW",
"SetHandleCount",
"GetFileType",
"QueryPerformanceCounter",
"GetCommandLineA",
"GetSystemTimeAsFileTime",
"SetUnhandledExceptionFilter",
"LCMapStringA",
"LCMapStringW",
"GetStringTypeA",
"GetStringTypeW",
"GetTimeZoneInformation",
"IsBadReadPtr",
"IsBadCodePtr",
"SetStdHandle",
"SetEnvironmentVariableA",
"InterlockedExchange",
"GetStartupInfoA",
"VirtualQuery",
"GetSystemInfo",
"VirtualAlloc",
"VirtualProtect",
"HeapFree",
"HeapAlloc",
"RtlUnwind",
"GetFileTime",
"GetFileAttributesA",
"FileTimeToLocalFileTime",
"SetErrorMode",
"FileTimeToSystemTime",
"GetOEMCP",
"GetCPInfo",
"TlsFree",
"LocalReAlloc",
"TlsSetValue",
"TlsAlloc",
"TlsGetValue"
],
"api_count": 124
},
{
"dll": "USER32.dll",
"apis": [
"PostThreadMessageA",
"MessageBeep",
"GetNextDlgGroupItem",
"InvalidateRgn",
"CopyAcceleratorTableA",
"SetRect",
"IsRectEmpty",
"CharNextA",
"GetSysColorBrush",
"ReleaseCapture",
"LoadCursorA",
"SetCapture",
"wsprintfA",
"DestroyMenu",
"ShowWindow",
"MoveWindow",
"SetWindowTextA",
"IsDialogMessageA",
"SetDlgItemTextA",
"RegisterWindowMessageA",
"WinHelpA",
"GetCapture",
"CreateWindowExA",
"GetClassInfoExA",
"GetClassNameA",
"SetPropA",
"GetPropA",
"RemovePropA",
"SendDlgItemMessageA",
"SetFocus",
"IsChild",
"GetWindowTextA",
"GetForegroundWindow",
"GetTopWindow",
"UnhookWindowsHookEx",
"GetMessageTime",
"GetMessagePos",
"MapWindowPoints",
"SetForegroundWindow",
"UpdateWindow",
"GetMenu",
"AdjustWindowRectEx",
"EqualRect",
"GetClassInfoA",
"RegisterClassA",
"UnregisterClassA",
"GetDlgCtrlID",
"DefWindowProcA",
"CallWindowProcA",
"SetWindowLongA"
],
"api_count": 124
},
{
"dll": "GDI32.dll",
"apis": [
"CreateRectRgnIndirect",
"GetMapMode",
"GetBkColor",
"GetTextColor",
"GetRgnBox",
"CreatePen",
"GetDeviceCaps",
"GetStockObject",
"DeleteDC",
"ExtSelectClipRgn",
"ScaleWindowExtEx",
"SetWindowExtEx",
"ScaleViewportExtEx",
"SetViewportExtEx",
"OffsetViewportOrgEx",
"SetViewportOrgEx",
"SelectObject",
"Escape",
"CreatePatternBrush",
"TextOutA",
"RectVisible",
"PtVisible",
"GetWindowExtEx",
"GetViewportExtEx",
"GetObjectA",
"DeleteObject",
"MoveToEx",
"LineTo",
"GetClipBox",
"SetMapMode",
"SetTextColor",
"SetBkColor",
"RestoreDC",
"SaveDC",
"CreateBitmap",
"BitBlt",
"Ellipse",
"CreateCompatibleDC",
"CreateCompatibleBitmap",
"ExtTextOutA"
],
"api_count": 40
},
{
"dll": "comdlg32.dll",
"apis": [
"GetFileTitleA"
],
"api_count": 1
},
{
"dll": "WINSPOOL.DRV",
"apis": [
"OpenPrinterA",
"DocumentPropertiesA",
"ClosePrinter"
],
"api_count": 3
},
{
"dll": "ADVAPI32.dll",
"apis": [
"RegCloseKey",
"RegQueryValueExA",
"RegOpenKeyExA",
"RegDeleteKeyA",
"RegEnumKeyA",
"RegOpenKeyA",
"RegQueryValueA",
"RegCreateKeyExA",
"RegSetValueExA",
"SetFileSecurityW"
],
"api_count": 10
},
{
"dll": "SHELL32.dll",
"apis": [
"CommandLineToArgvW"
],
"api_count": 1
},
{
"dll": "COMCTL32.dll",
"apis": [
"ord_17"
],
"api_count": 1
},
{
"dll": "SHLWAPI.dll",
"apis": [
"PathFindFileNameA",
"PathStripToRootA",
"PathFindExtensionA",
"PathIsUNCA"
],
"api_count": 4
},
{
"dll": "oledlg.dll",
"apis": [
"ord_8"
],
"api_count": 1
},
{
"dll": "ole32.dll",
"apis": [
"CreateILockBytesOnHGlobal",
"StgCreateDocfileOnILockBytes",
"StgOpenStorageOnILockBytes",
"CoGetClassObject",
"CoTaskMemAlloc",
"CoTaskMemFree",
"CLSIDFromString",
"CLSIDFromProgID",
"OleUninitialize",
"CoFreeUnusedLibraries",
"CoRegisterMessageFilter",
"OleFlushClipboard",
"OleIsCurrentClipboard",
"CoRevokeClassObject",
"OleInitialize"
],
"api_count": 15
},
{
"dll": "OLEAUT32.dll",
"apis": [
"ord_6",
"ord_4",
"ord_9",
"ord_12",
"ord_8",
"ord_7",
"ord_150",
"ord_420",
"ord_184",
"ord_16",
"ord_2",
"ord_10"
],
"api_count": 12
}
],
"exports": [
"LayvXBcOppdgzCgnncA"
],
"export_count": 1,
"exphash": "f66bacc99dfdc9927b5678a2134e87c4",
"relocation_count": 0,
"relocation_blocks": [],
"tls_callbacks": [],
"delay_imports": [
"OLEACC.dll"
],
"rich_header_entries": [],
"resource_types": [],
"version_info": {}
}
}
}Tuning format-specific analysis
If you only care about PE (e.g. Windows-only lab):
python3 fileinfo.py --no-elf --no-macho -r ./pe_samples/Same idea for ELF-only or Mach-O-only environments.
Real malware: MalwareBazaar integration
The repo includes download_malware_sample.py to pull real Windows PE samples from MalwareBazaar (abuse.ch), run full static analysis, and save MalwareBazaar metadata for comparison.
- Get a free API key from abuse.ch Authentication.
- Install deps:
pip install requests pyzipper - Set your key:
export ABUSE_CH_AUTH_KEY='your-key'
Then:
# Download one recent sample and run --full analysis
python3 download_malware_sample.py# By known SHA256 (e.g. from a report)
python3 download_malware_sample.py 9FDEA40A9872A77335AE3B733A50F4D1E9F8EFF193AE84E36FB7E5802C481F72# By tag (e.g. Emotet, TrickBot)
python3 download_malware_sample.py --tag Emotet --limit 1
Per sample, you get a directory under malware_samples/<sha256>/ with the binary, our_analysis.json (from fileinfo.py --full --json), and bazaar_info.json (MalwareBazaar metadata). You can diff hashes, imphash, file type, and PE/string findings against the feed and public reports.
Who is this for?
- Malware analysts: Quick triage (hashes, type, entropy, packing, imphash) and deep static reports for comparison with threat intel.
- Digital forensics: Consistent metadata (including timestamps and signatures) across many files; CSV/JSON for timelines and tooling.
- SOC engineers: Scriptable file intelligence (JSON/CSV), optional YARA, and hashes that plug into VirusTotal/MalwareBazaar/EDR.
Summary
Basic File Information Gathering Script gives you:
- One CLI (
fileinfo.py) for batch file metadata and optional deep static analysis. - Single-pass multi-hash (MD5 through SHA-512) plus optional ssdeep/tlsh.
- PE/ELF/Mach-O–aware fields: timestamps, imphash, entry point, packing, signatures, and with
--full: sections, imports/exports, version info, entropy map, and string patterns (URLs, IPs, paths, registry). - Output as table, JSON, or CSV for automation and integration.
- Optional YARA and a MalwareBazaar downloader script for real-sample workflow.
All of this is static only — no execution, no decompilation — so you can run it safely in automation and air-gapped labs. If you’re building or tightening a file-intel or malware-triage pipeline, this tool is worth a slot in your toolkit.