Over the years I have made a lot of attempts to systematically extract Windows API information from various sources, but primarily, of course, from Microsoft help documentation available at different times, in different forms and file formats. If you need to ask… I really needed an ‘actionable’ dump of these for my API monitor, and I also wanted to have it all available for quick & dirty reference, for both coding and reversing purposes. Plus, as I will explain later, for other purposes. Unsurprisingly, this strange journey ended up being closely aligned with the never-ending changes to Microsoft help system, and it naturally ended up with me fighting a ‘lost by default’, bitter battle against the odds, for many years…
~20 years ago win32.hlp was THE file you needed and wanted. It included descriptions of many Windows API functions and was a gold mine when it came to understanding the myriads of parameters, return values, and context required to use most of these popular Windows APIs properly. Interestingly, one could decompile the content of that .hlp file into a super large RTF file. The result was a bit difficult to parse, but lots of textual data could be made accessible this way, kinda programmatically, and kinda easily.
HLP files were the WinHelp files. Microsoft Help system 1.0.
Next, if I remember correctly, some of the Microsoft DDKs started including .chm files. One could decompile these to get access to raw, yet kinda uniformly formatted HTML files, and these could be parsed, as well. I don’t recall this format really taking off too much though but I may be mistaken.
CHM files were the Microsoft Compiled HTML Help files. Microsoft Help system 1.x.
Then came the HxS files. I loved them very much, because these were JUICY. Decompiling them was not difficult, and as a result you would get lots of very nicely formatted data files for parsing. I think it was also the first time XML was used for windows API help, but again, I may be mistaken. I don’t have many of my working files left from these times, sadly.
HxS files were the Microsoft Help 2 files. Microsoft Help system 2.0.
And then the Help files migrated one more time. This time to a local, online system…
http://127.0.0.1:47873/help/<version>/ms.help...
The address above was where all the juice was stored. By sending a set of additional requests one could enumerate all the pages, one by one, and many of these covered functions, methods, structures, etc… These could be then saved and parsed. Interestingly, while requesting all of these pages we were able to choose the format of the delivered pages, and XML was both a novelty at that time, and something we also wanted very much! That’s probably for the first time ever, the Windows API information was stored, and was made accessible in such consistent and parsable format to everyone who _knew_!
It was Microsoft Help Viewer aka Microsoft Help system 3.x.
Today API help is no longer that interesting (okay, it is a lie!), but thankfully, it is stored primarily online. Interestingly, after all the different formats from the past, it is now stored in a Markdown language format (*.md).
Now, the main reason I am writing about the history of help files is to bring your attention to msdocsviewer. This is a new IDA plugin written by Alexander Hanel. Once you install this plug-in, all you have to do is go to any Windows API referenced in a code you analyze in IDA and then press CTRL+SHIFT+Z. The panel with all the information about that ‘highlighted’ API will pop-up. You can dock that panel and then continue pressing CTRL+SHIFT+Z on other API functions to their see details as you go along. In my eyes, as of 2023, this is the best Windows API helper that has ever been written. Idascope was cool, Mandiant’s plug-in was cool, but now we have msdocsviewer and it’s TRULY COOL. It works like a charm and I highly recommend it.
I will end this post with a few data dumps.
You may think this is the end of the post, but it’s not. If you look at the file content of 2013_apis.zip/list_final8 you will notice one thing: not only I extracted function information that is typically available (a prototype), but I also tried to extract information about all the constants this or that particular function’s parameter or argument would refer to, hence f.ex. for CreateFile I would generate this information:
TITLE=CreateFile function FUN=CreateFile ARG=_In_ LPCTSTR lpFileName, ARG=_In_ DWORD dwDesiredAccess, ARG=_In_ DWORD dwShareMode, ARG=_In_opt_ LPSECURITY_ATTRIBUTES lpSecurityAttributes, ARG=_In_ DWORD dwCreationDisposition, ARG=_In_ DWORD dwFlagsAndAttributes, ARG=_In_opt_ HANDLE hTemplateFile RET=HANDLE WINAPI PAR=lpFileName [in] PAR=dwDesiredAccess [in] PAR=dwShareMode [in] VALUES= VAL=0 VAL=FILE_SHARE_DELETE VAL=FILE_SHARE_READ VAL=FILE_SHARE_WRITE PAR=lpSecurityAttributes [in, optional] PAR=dwCreationDisposition [in] VALUES= VAL=CREATE_ALWAYS VAL=CREATE_NEW VAL=OPEN_ALWAYS VAL=OPEN_EXISTING VAL=TRUNCATE_EXISTING PAR=dwFlagsAndAttributes [in] PAR=hTemplateFile [in, optional] LIB=Kernel32.lib DLL=Kernel32.dll HDR=FileAPI.h (include Windows.h); WinBase.h on Windows Server 2008 R2, Windows 7, Windows Server 2008, Windows Vista, Windows Server 2003, and Windows XP (include Windows.h) UNI=CreateFileW ANS=CreateFileA MINC=Windows XP MINS=Windows Server 2003
Do you see where it is heading?
Yes, I was writing all these parsers with one thing in my mind. If I can not only use this information to build a list of APIs, their arguments, their in/out properties, but ALSO reference constants they refer to, or expect, then I may be in a position to generate stubs for handling some of the hooked APIs in my sandbox that almost (with minor edits) can give me ‘string’ representations of immediate values, or boolean masks for most APIs!
And it worked! It was a HUGE helper at that time as I could just generate these stubs, edit them a bit, and within minutes I would be in a position to support yet another API w/o going through a painful process of analysing each API individually. And by ‘handling’ I mean including both decimal/hexadecimal values passed to, or returned by a function, but also showing their string equivalents, where applicable, as well. Of course I had to correct some of these automatically generated stubs, but it was far easier than doing everything from the scratch, for each and every API I wanted to hook.
And when I worked on my Frida monitor, I used the very same principle, hence some of the code covers constants pretty well. In my eyes, a good sandbox is the one that understands both arguments and result values well…