This series is quite old, and I kinda abandoned it at some stage, but today I am reviving it to talk about … static analysis…
Let’s be honest – last 2 decades changed the way we do malware analysis, and for many reasons:
In 2010 malware analysts’ skills were measured by the knowledge of debuggers, disassemblers, file formats, packers, etc. Now… we are in 2025 and let’s be honest… malware analysis process of today usually starts with a submission of a sample to a sandbox / sample analysis portal. And, sadly, it very often ends there!
This is where this post begins.
I am quite surprised that many automated malware analysis solutions do not process samples statically very well. They do not do in-depth file format analysis, they do not recognize corrupted files well, and often offer a false sense of security/value by offering a CLEAN verdict for files that simply need more …. reversing love.
See the below example.
I took Notepad.exe from Win10, truncated it with a hex editor, and then submitted it to a few online file analysis services. I am happy that some of them immediately marked the file as corrupted, but it didn’t stop them from running a full-blown dynamic analysis session on the file I submitted. And in terms of static analysis, some solutions went as far as to report lots of findings related to anti-reversing techniques, cryptography, and lots of far-fetched conclusions that are nonsensical in a context of a) a corrupted file, b) Notepad program (clearly non-malicious), and are simply not a true reflection of reality.
I kid you not, but a truncated notepad sample that will never execute was marked as
Let’s be clear – mapping presence of APIs in the sample’s import table or as a string referencing API name found in a sample’s body to actual ‘threats’ or TTPs is an absurdity that is omnipresent in sandbox reports today and should be corrected asap. This could have worked in 2010, but today these sort of ‘determinations’ must be seen as poor indicators.
And as an analyst, I’d actually like to see why the sample was marked as corrupted. I’d also like to see the context of the far-fetched API-matching claims as well. You can’t list many Windows API in a negative context (like f.ex. CreateDC that notepad uses for… printing) unless you really can prove that it is indeed present in the code to deliver some malicious functionality… It strikes us as an over-simplistic approach that is focused more on the quantity of the findings than the overall quality of the report.
This is where old-school reversing comes in.
A long time ago I wrote my own PE file parser that I always run on all PE samples that I analyze, first. Because I wrote it, I fully control what it tells me, and since I used this tool to analyze many files over the years, corrected it on many occasions, learned a lot about PE file format intricacies on the way, and I have incorporated a lot of PE file format checks into it.
Running it on my truncated Notepad sample I immediately get many red flags:
(Raw Offset + Raw size of '.data '=0002EC00>filesize=0002DE00 (Offset to Raw size of '.pdata '=0002EC00>filesize=0002DE00 (Offset to Raw size of '.didat '=0002FE00>filesize=0002DE00 (Offset to Raw size of '.rsrc '=00030000>filesize=0002DE00 (Offset to Raw size of '.reloc '=00030C00>filesize=0002DE00 (wrong appdata ofs/size=0002EC00,00000000) (.rsrc File Offset 00030000 <> DataDirectoryResourceOffset = 00000000
Seeing this kind of result immediately alters the way I do my sample analysis:
My point is… if we want to sandbox/automate sample analysis, let’s do it in a smarter way. File format parsing is an extremely complex topic. If you look at Detect It Easy program’s data base, you will find a huuuuge number of file-typing routines that try to analyze various file types and return the best verdict possible.
So what can we do today?
Ask Sandbox vendors to do a more thorough static analysis that check file’s basic properties and at the most basic level, verifies if we have enough data in a submitted file to cover all the sections listed in a PE header…