State-of-the-art binary code analysis tools

Previously we’ve covered how to start using the decompiler, but unmodified decompiler output is not always easy to read, especially if the binary doesn’t have symbols or debug information. However, with just a few small amendments you can improve the results substantially. Let’s look at some basic interactive operations available in the pseudocode view.

Renaming

Although it sounds trivial, renaming can dramatically improve readability. Even something simple like renaming of v3 to counter can bring immediate clarity to what’s going on in a function. Coupled with the auto-renaming feature added in IDA 7.6, this can help you propagate nice names through pseudocode as you analyze it. The following items can be renamed directly in the pseudocode view:

  • local variables
  • function arguments
  • function names
  • global variables (data items)
  • structure members

Renaming is very simple: put the cursor on the item to rename and press N – the same shortcut as the one used in the disassembly listing. Of course, the command is also available in the context menu.

You can also choose to do your renaming in the disassembly view instead of pseudocode. This can be useful if you plan to rename many items in a big function and don’t want to wait for decompilation to finish every time. Once you finished renaming, press F5 to refresh the pseudocode and see all the new names. Note that register-allocated local variables cannot be renamed in the disassembly; they can only be managed in the pseudocode view.

Retyping

Type recovery is one of the hardest problems in decompilation. Once the code is converted to machine instructions, there are no more types but just bits which are being shuffled around. There are some guesses the decompiler can make nevertheless, such as a size of the data being processed, and in some cases whether it’s being treated as a signed value or not, but in general the high-level type recovery remains a challenge in which a human brain can be of great help.

For example, consider this small ARM function:

sub_4FF203A8
  SUB R2, R0, #1
loc_4FF203AC
  LDRB R3, [R1],#1
  CMP R3, #0
  STRB R3, [R2,#1]!
  BNE loc_4FF203AC
  BX LR

Its initial decompilation looks like this:

We see that the decompiler could guess the type of the second argument (a2, passed in R1) because it is used in the LDRB instruction (load byte). However, v2 remains a simple int because the first operation done on it is a simple arithmetic SUB (subtraction). Now, after some thinking it is pretty obvious that both v2 and result are also byte pointers and the subtraction is simply pointer math (since pointers are just numbers on the CPU level).

We can fix things by changing the type of both variables to the same unsigned __int8 * (or the equivalent unsigned char *). To do this, put cursor on the variable and press Y, or use “Set lvar type” from the context menu.

Alternatively, instead of fixing the local variable and then the argument, you can directly edit the function prototype by using the shortcut on the function’s name in the first line.

In that case, first argument’s type will be automatically propagated into the local variable and you won’t need to change it manually (user-provided types have priority over guessed ones).

In the final version there are no more casts and it’s clearer what’s happening. We’ll solve the mystery of the function’s purpose next week, stay tuned!