Hex-Rays' blog

Custom data types and formats – Hex Rays

Written by   Elias Bachaalany | Feb 24, 2010

Another new feature that will be available in the upcoming version of IDA Pro is the ability to create and render custom data types and formats.


(Embedded instructions disassembled and rendered along side with x86 code)

What are custom types and formats

  • Custom data type: A custom type is basically just a way to tag some bytes for later display with custom format, when the built-in IDA types (dt_byte, dt_word, etc) are not enough.
    For example: an XMM vector, a Pascal string, a half-precision (16 bits) floating-point number, a 16:32 far pointer (fword), uleb128 number and so on.
    To define a custom type, you need to provide its name, size (fixed or dynamically calculated), keyword for disassembly and a few other attributes.
  • Custom data format:
    The custom data format allows you do display a custom or built-in data type in any way you like. You can register several formats for each type and switch the representation.
    For example, you might want to switch the display of the same 16-byte XMM vector between four floats or two doubles.
    A format definition includes callback for printing (to display) and scanning (used during debugging to change the register values).

For example, here is a custom MAKE_DWORD format applied to the built-in dword type:

Its implementation is very simple:

Next we illustrate some possible usages of custom types and formats. Other uses are also possible too, it is up to your imagination.

Decoding embedded bytecodes

Imagine you are debugging an x86 program that implements its own VM and embeddes them in the program.
The classical solution for this problem can be:

  • Write a dedicated processor module and then load the extracted bytecodes separately
  • Or define the bytecodes as bytes and then use comments to describe the real meaning of those bytecodes.

With this new addition, one can just write a custom data type to handle the situation:

And if you happen to have a situation where the bytecodes are operands to instructions (as means of obfuscation), you can still apply the custom format on those operands:

The previous blog entry showed how to write processor modules using Python. What if one simply uses the “import” statement to import a full-blown processor module script and use it in the custom data types/formats? 😉

Displaying resource strings

When reversing MS Windows applications, one can encounter string IDs, but then how to easily and nicely go fetch the data and display it in the disassembly listing?
Normally, one would have to use a resource editor to extract the string value corresponding to the string id, then to create an enum in IDA for each string ID with a repeatable comment:

That works, but what about writing your own custom format instead:

And then applying it directly without having to use a resource editor to extract the string value, have the custom format do that programmatically for you :

This is how a resource string custom format handler can look like:

To take a closer look at it, you can download the custom data type handler script along with the source code of the simplevm assembler/disassembler and the C program that was used in this article.