December 3, 2022 in Malware Analysis
If you ever used shellcode_hashes IDA plugin from Mandiant, you probably have also used make_sc_hash_db.py before. But, if you haven’t, this post is for you.
The focus of the article is on the the make_sc_hash_db.py script – it is used to generate a SQLite database sc_hashes.db that in turn is used by shellcode_hashes_search_plugin.py (used from IDA GUI) to identify immediate values that could be hashes of known APIs inside the decompiled binary. It’s fast and superhandy for position independent code analysis, including inline and implanted PE file loaders that rely on such API hashing functionality (multiple API hashing algos are supported).
As per the readme.md, the make_sc_hash_db.py can be called with the following arguments:
python make_sc_hash_db.py <database name> <dll directory>
The best is of course to run it on a subset of the c:\windows\system32 directory, with a focus on the most common libraries and the sc_hashes.db speaks to that directly, including only API hashes for the following libraries:
BUT
it’s also handy to have a larger data set available.
When I played with it a few years ago, I generated all hashes from the whole C:\windows\system32 directory.
Why?
Because you never know when you will stumble upon a hash value that is not represented inside the sc_hashes.db.
Now, you may think that replacing default sc_hashes.db with your full_blown_system32_dataset.db is the best idea ever, but it’s not. The sc_hashes.db is 50MB file, and the the full_blown one is ~600MB. SQLite is fast, but Ida+python+SQLite, not so much. So, you have been warned.
The bottom line:
Use default sc_hashes.db for all your cases first, and only if you find hashes outside of this set, try to look for the hash inside the full_blown one (either via SQLIte interface, or via grep/rg on a text export). Finally, if you discover which DLL the API hash belongs to, you can always generate a new SQLite DB set based on that single DLL (just needs to be copied to a working directory for the make_sc_hash_db.py script to process it).
And if you don’t understand any of it, just download this full_blown_limited_output.zip file (45MB warning). It includes many hashes and many APIs. You can simply grep it for unknown API hash. Who knows, maybe you will get lucky…