Many features of IDA and other disassemblers are taken for granted nowadays but it’s not always been the case. As one example, let’s consider automatic variable naming.
A little bit of history
In the first versions, IDA did not differ much from a dumb disassembler with comments and renaming and showed pretty much raw instructions with numerical offsets. To keep track of them users often had to add manual comments.
A few versions later, support for stack variables appeared. They initially had dummy names (var_4
, var_C
etc.) but could be renamed by the user which eased the reverse engineering process. However, this could still be tedious in big programs.
Next, FLIRT was added, which helped identify standard library functions. Now the user did not need to analyze boilerplate code from the compiler runtime libraries but only the code written by the programmer. Having identified library functions also helped in picking names for variables: most library functions had known prototypes so the variables used for their arguments could be renamed accordingly.
However, this process was still manual, could it not be automated?
And indeed, this is what happened in IDA 4.10, with the addition of the type system and standard type libraries. Now the identified library or imported functions could be matched to their prototypes in the type library and their arguments commented and/or renamed. For the arguments using a complex type (e.g. a structure), the stack variable could also be changed to use that type.
In practice
As a current example, let’s have a look at a Win32 program which calls CreateWindowExA
.
First, with everything disabled:
mov eax, [ebp-20h] push dword ptr [ebp+8] sub eax, [ebp-28h] push dword ptr [ebx+1Ch] push eax mov eax, [ebp-24h] sub eax, [ebp-2Ch] push eax push dword ptr [ebp-28h] push dword ptr [ebp-2Ch] push dword ptr [ebp-8] push edi push offset aEdit ; "edit" push edi call ds:CreateWindowExA
Next, with stack variables:
mov eax, [ebp+var_20] push [ebp+arg_0] sub eax, [ebp+var_28] push dword ptr [ebx+1Ch] push eax mov eax, [ebp+var_24] sub eax, [ebp+var_2C] push eax push [ebp+var_28] push [ebp+var_2C] push [ebp+var_8] push edi push offset aEdit ; "edit" push edi call ds:CreateWindowExA
Stack variables are created but use dummy names. We could consult the function’s documentation and rename and retype them manually. But instead we can enable argument propagation and reanalyze the function:
mov eax, [ebp+Rect.bottom] push [ebp+hMenu] ; hMenu sub eax, [ebp+Rect.top] push dword ptr [ebx+1Ch] ; hWndParent push eax ; nHeight mov eax, [ebp+Rect.right] sub eax, [ebp+Rect.left] push eax ; nWidth push [ebp+Rect.top] ; Y push [ebp+Rect.left] ; X push [ebp+dwStyle] ; dwStyle push edi ; lpWindowName push offset aEdit ; "edit" push edi ; dwExStyle call ds:CreateWindowExA
Now, all arguments are renamed and all instructions initializing them are commented. The Rect
variable was renamed and typed thanks to another place in the same function:
lea eax, [ebp+Rect] push eax ; lpRect push ebx ; hWnd call ds:GetClientRect
Here, IDA recognized that the lea
instruction effectively takes an address of a struct so the stack variable should be the struct itself and not just a pointer. Thanks to this, the field references are clearly identified in the other snippet.
Recursive propagation
In fact, PIT is not limited to single functions: if any of the function’s own arguments are renamed or retyped thanks to the type information, this information is propagated up the call tree. For example, arg_0
from the second snippet is a function argument which was renamed to hMenu
, so this information is used by the caller: