State-of-the-art binary code analysis tools

Many features of IDA and other disassemblers are taken for granted nowadays but it’s not always been the case. As one example, let’s consider automatic variable naming.

A little bit of history

In the first versions, IDA did not differ much from a dumb disassembler with comments and renaming and showed pretty much raw instructions with numerical offsets. To keep track of them users often had to add manual comments.

A few versions later, support for stack variables appeared. They initially had dummy names (var_4, var_C etc.) but could be renamed by the user which eased the reverse engineering process. However, this could still be tedious in big programs.

Next, FLIRT was added, which helped identify standard library functions. Now the user did not need to analyze boilerplate code from the compiler runtime libraries but only the code written by the programmer. Having identified library functions also helped in picking names for variables: most library functions had known prototypes so the variables used for their arguments could be renamed accordingly.

However, this process was still manual, could it not be automated?

And indeed, this is what happened in IDA 4.10, with the addition of the type system and standard type libraries. Now the identified library or imported functions could be matched to their prototypes in the type library and their arguments commented and/or renamed. For the arguments using a complex type (e.g. a structure), the stack variable could also be changed to use that type.

In practice

As a current example, let’s have a look at a Win32 program which calls CreateWindowExA.

First, with everything disabled:

mov     eax, [ebp-20h]
push    dword ptr [ebp+8]
sub     eax, [ebp-28h]
push    dword ptr [ebx+1Ch]
push    eax
mov     eax, [ebp-24h]
sub     eax, [ebp-2Ch]
push    eax
push    dword ptr [ebp-28h]
push    dword ptr [ebp-2Ch]
push    dword ptr [ebp-8]
push    edi
push    offset aEdit    ; "edit"
push    edi
call    ds:CreateWindowExA

Next, with stack variables:

mov     eax, [ebp+var_20]
push    [ebp+arg_0]
sub     eax, [ebp+var_28]
push    dword ptr [ebx+1Ch]
push    eax
mov     eax, [ebp+var_24]
sub     eax, [ebp+var_2C]
push    eax
push    [ebp+var_28]
push    [ebp+var_2C]
push    [ebp+var_8]
push    edi
push    offset aEdit    ; "edit"
push    edi
call    ds:CreateWindowExA

Stack variables are created but use dummy names. We could consult the function’s documentation and rename and retype them manually. But instead we can enable argument propagation and reanalyze the function:

mov     eax, [ebp+Rect.bottom]
push    [ebp+hMenu]     ; hMenu
sub     eax, []
push    dword ptr [ebx+1Ch] ; hWndParent
push    eax             ; nHeight
mov     eax, [ebp+Rect.right]
sub     eax, [ebp+Rect.left]
push    eax             ; nWidth
push    []  ; Y
push    [ebp+Rect.left] ; X
push    [ebp+dwStyle]   ; dwStyle
push    edi             ; lpWindowName
push    offset aEdit    ; "edit"
push    edi             ; dwExStyle
call    ds:CreateWindowExA

Now, all arguments are renamed and all instructions initializing them are commented.  The Rect variable was renamed and typed thanks to another place in the same function: 

lea     eax, [ebp+Rect]
push    eax             ; lpRect
push    ebx             ; hWnd
call    ds:GetClientRect

Here, IDA recognized that the lea instruction effectively takes an address of a struct so the stack variable should be the struct itself and not just a pointer. Thanks to this, the field references are clearly identified in the other snippet.

Recursive propagation

In fact, PIT is not limited to single functions: if any of the function’s own arguments are renamed or retyped thanks to the type information, this information is propagated up the call tree. For example, arg_0 from the second snippet is a function argument which was renamed to hMenu, so this information is used by the caller: