In my last post I mentioned the outdated PAD files. Let’s have a closer look at them.
Before we do so, a short comment first — in the era of omnipresent GenAI buzz sometimes it’s really hard to convince yourself to do any research let alone share the results of it. Everything feels ‘old’, the GenAI obviously knows all the answers, and no one can compete with this vast amount of information that can be extracted from these AI models so effortlessly, even if their advice sometimes feels a bit hallucinogenic…
What keeps me going is the good ol’ adage – luck favors a prepared mind. I believe that you can’t utilize GenAI properly if you don’t know the fundamentals, if you don’t do the legwork, if you don’t research on your own. Ironically, in order to use GenAI efficiently and effectively one has to know far more than before because GenAI is a very strong, confident assistant that often… turns an opponent. It may assist us in the best possible meaning of the word, or can ruin us, if we blindly trust its outputs…
The surprising twist is that we can only get better at using the GenAI by first getting better at the ‘non-AI’ stuff aka ‘the old’. And this post is dedicated to ‘the old’.
Yes, no one cares about PAD files anymore, so why bringing them up? Well, I hope I will convince you that there is still value out there…
So, the PAD files…
It’s hard to download them today, but in the past one could download PAD files from at least these 2, now defunct websites:
Luckily, old copies of QArchive repository still reside on https://web.archive.org f.ex. here (28K+ PAD URLs), and I will share 14K+ PAD files from http://repository.appvisor.com below.
When you attempt to download all the PAD files possible, and/or the URLs they point to, you will quickly realize that many links no longer work. Not a surprise, after all, it is a legacy protocol, and lots of killer-app-wannabe software products never really made it, and in the end – their presence online was barely noticeable. For the purpose of our discussion though, it’s worth mentioning that one can still use webarchive.org to download the copies of these PAD files from the time right before the website hosting them closed… so yes, there is a way to collect many of them, even if they are officially long gone.
Analyzing many PAD files (in bulk) can give us an insight into many interesting aspects of a shrinking, yet still present old-school software distribution model.
For starters, analysing a repo of many PAD files gives us a quick&dirty software categorization list: whatever is listed inside the Program_Category_Class element is of interest. An example category list extracted from 14K PAD files is shown here. By mapping Program_Category_Class to directories that programs are installed to (can be extracted via installer unpackers), one can build a simple categorization engine for these known combos.
The Primary_Download_URL, Secondary_Download_URL, Additional_Download_URL_1, Additional_Download_URL_2, DP_Distributive_Primary_URL elements point to actual URLs that you can use to download the latest version of the software. One way to utilize these is to collect a list of (most likely) clean software installers that can be used to build your ‘good samples’ repo. This in turn can be used to tune and quality-test your yara, yara-x, capa rules…
The Company_Info element and its children may help to collect useful info about (most likely) legitimate companies – there are emails, phone numbers, social media accounts, etc.
Believe it or not, many of these software products still exist out there, in the wild. They are installed on actual endpoints and the information provided inside these PAD files can help us to do 2 things:
Last, but not least – this is an archive of 14K+ PAD files from http://repository.appvisor.com I downloaded in 2021.
Now… for the bad news.
Analysis of PAD files will give you a list of no-longer-existing domains that may be vetted as trusted by security vendors due to past encounters when the software was still alive. Secondly, some of the existing, installed software packages that happen to be ‘dead’ by now may still include auto-update functionality. Yes, this offers a supply-chain-attack possibility where one can re-register an expired domain and place the malicious updater on this new site. Next time the legitimate autoupdate of now defunct software kicks in, it will now resolve the domain, download the updater and execute it.
Take that, GenAI…