This is a guest entry written by Alexander Hanel from CrowdStrike. His views and opinions are his own and not those of Hex-Rays. Any technical or maintenance issues regarding the code herein should be directed to the author.
Msdocviewer: A simple tool for viewing Microsoft’s technical specifications
An invaluable resource when reverse engineering Portable Executable (PE) binaries is Microsoft’s Windows Application Programming Interfaces (API) technical documentation. Microsoft’s documentation (commonly referred to as MSDN) describes how an API can interact with the Windows operating system. By piecing together how individual APIs interact with Windows, an analyst can infer functionality or a subset of the functionality within a binary. Having the documentation within IDA helps speed up the reverse engineering process because it can be easily accessed without switching to a browser or other third-party document viewer. The following image is an example of the msdocviewer IDA plugin. The highlighted API GetProcessHeap
can be seen on the left in graph view with the corresponding API documentation on the right.
Description
The msdocviewer is a simple tool that parses Microsoft’s Win32 API and driver documentation, so they are viewable in IDA. The tools consist of three parts: the first is two git repositories, the second is the parser, and the third is an IDA plugin.
The first part uses two repositories from Microsoft Doc’s. In an effort to allow public contribution to their API documentation, Microsoft posts their technical specifications on GitHub in the Markdown format under Microsoft Docs. The first repository is the Win32 API documentation (with a directory name of sdk-api
), and the second is the Windows Driver DDI reference documentation (with a directory name of windows-driver-docs-ddi
).
The second part is a Python script that parses the documentation repositories, finds all documents related to function APIs, once found it copies the document to a directory, rearranges some of the text, and then renames the document to its corresponding API name. For example, the file name nf-fileapi-createfilea.md
is renamed to CreateFileA.md
.
The third part is an IDA plugin written in IDAPython that leverages PyQt’s Markdown viewer to display the API’s documentation.
Installing Msdocviewer
The first step in installing msdocviewer is to clone the repository from Github using the below command.
git clone https://github.com/alexander-hanel/msdocsviewer.git
Once the repository has been cloned, the repository hosting the documentation needs to be downloaded. The repositories are stored as submodules and, therefore, can be downloaded by executing the following commands.
cd msdocviewer
git submodule update --init --recursive
The submodules repositories are over 2GB in size and, therefore, can take a while to download. If the git submodule command throws an exception, re-executing the second command should take care of the exception. Once downloaded, the documentation repositories need to be parsed by executing python run_me_first.p
y. Note: This Python script requires pyyaml
, which can be installed by executing pip install -r requirements.txt
. The following is an example output of run_me_first.py
:
C:\Users\Admin\Documents\repo\msdocsviewer>python run_me_first.py
INFO - creating apis_md directory at C:\Users\Admin\Documents\repo\msdocsviewer\apis_md
INFO - starting the parsing, this can take a few minutes
INFO - parsing C:\Users\Admin\Documents\repo\msdocsviewer\sdk-api\sdk-api-src\content
INFO - parsing C:\Users\Admin\Documents\repo\msdocsviewer\sdk-api\sdk-api-src\content completed
INFO - parsing C:\Users\Admin\Documents\repo\msdocsviewer\windows-driver-docs-ddi\wdk-ddi-src\content
INFO - parsing C:\Users\Admin\Documents\repo\msdocsviewer\windows-driver-docs-ddi\wdk-ddi-src\content completed
INFO - finished parsing, if using IDA add path C:\Users\Admin\Documents\repo\msdocsviewer\apis_md to API_MD variable in idaplugin/msdocviewida.py
During the parsing process, some function names are invalid files and, therefore, not created. To see what files and functions are skipped, an optional command line argument of –log
or -l
can be added. It stores the log to a file named debug-parser.log
. There is also a command line option of –overwrite
or -o
to overwrite the current documentation. Overwriting is recommended if an old version of msdocviewer is present or if Microsoft updates their docs repos.
The plugin can be activated either through the Edit, Plugins, msdocviewer, or using the hotkey ctrl-shift-z
.
Writing Your Own Documentation Tool
Overall, the code used to make msdocviewer is simple. Without the logging functionality, the parser and the IDA plugin each contain less than 100 lines of Python code. While msdocviewer is useful for reverse engineering Windows binaries, it has little value for Linux or other platforms. What makes msdocviewer useful is the documentation. So, to write a tool of value for your platform, all that is needed is documentation. Once the documentation is found and converted to a supported format (PyQt supports formats of Markdown, HTML, or Plaintext), it is easy to use IDAPython to make your own custom documentation tool. If you don’t want to start from scratch, feel free to use the msdocviewer IDA plugin as a skeleton. The functions get_selected_api_name
and load_markdown
contain the core logic of the plugin. The first function extracts the selected string using ida_kernwin.get_highlight and the second uses the selected string to determine what file name should be opened and displayed.
Since msdocviewer is looking up file names on disk, it is easy to add your own documentation. For example, I always forget the enum values for SYSTEM_INFORMATION_CLASS
. A fix for this is to create a file named SYSTEM_INFORMATION_CLASS.md
within the apis_md directory with all the relevant documentation within the Markdown. Now, I only need to highlight the text SYSTEM_INFORMATION_CLASS
and open the plugin (I prefer the hotkey; it is super convenient).
Closing
As previously mentioned, msdocviewer is a simple plugin because of how easy IDAPython makes selected data available and how straightforward it is to display Microsoft's documentation using PyQt. If your focus isn’t on reverse engineering Window binaries, I hope this blog post gives you motivation to make your own documentation viewer for other platforms.
The msdocviewer plugin is available on GitHub https://github.com/alexander-hanel/msdocsviewer