The IDA patfind plugin

The IDA patfind plugin

Just raw binary data at address 0x00000AC

While IDA excels at extracting useful information from all sorts of binary files, it may happen that some unstructured binary files (e.g., firmwares, raw memory dumps, …) throw it off the rails, and the user needs to kickstart autoanalysis by figuring out some sort of entry point. To help with that, for decades now, IDA has had a feature called “code startup sequences,” useful to spot some binary patterns indicating the start of functions.

As it turns out, those lack flexibility and do not systematically produce the best results. That is what prompted one of the 2021 plugin contestants to come up with a plugin that would improve on that idea by providing stronger, richer, more powerful startup sequences. He called his plugin IDAPatternSearch His idea was so good that it, in turn, inspired us at Hex-Rays to build upon it and improve it even further: this eventually became the patfind plugin. Thanks to patfind, that same binary file now looks like this (out of the box):

Pattern found at address 0x00000AC and function created

patfind was released as part of IDA 8.0 (https://hex-rays.com/products/ida/news/8_0/#better-firmware-analysis-thanks-to-the-function-finder-plugin-patfind)

Configuration

The patfind plugin will run automatically every time an unstructured binary file is loaded. There is also an option to run it manually from the menu ‘Edit->Plugins->Find Functions.’

The IDA patfind plugin can be configured with the patfind.cfg. There is not much to configure, really, just the place where to look for pattern files and whether the plugin should run automatically or not.

In-depth

Binary and binary-like files are difficult to analyze if you don’t know what part (if any) of the binary data contains code. The patfind plugin searches for places in the binary file where functions start and tell IDA to analyze the data as a function at the found addresses. That, in combination with the capabilities of IDA to check cross-references and follow function calls, may result in a fully analyzed file. Still, a lot depends on the quality of the provided patterns.

Every compiler has its own architecture-specific idioms when it comes to code generation. Corresponding patterns are defined in XML files. A very strict pattern may result in a perfect match, but if there’s a slight variation in the generated code, it will fail to match. On the other hand, a pattern that’s too loose will result in many results, but many of those will be false positives. The trick then is to find the right balance, using wild cards in patterns and different combinations of instructions or by checking the binary data just before where there might be a function to see if it matches the ending of a previous function, for example.

It is possible to add new architectures by simply adding a new XML file, just like the other XML files. It’s also possible to add, remove or change existing patterns for better matching.

The XML files used by patfind initially came from a Ghidra distribution and were augmented with a set of attributes that make them even more useful (e.g., automatic support for endianness).