State-of-the-art binary code analysis tools

Although the Hex-Rays decompiler was originally written to deal with compiler-generated code, it can still do a decent job with manually written assembly. However, such code may use non-standard instructions or use them in non-standard ways, in which case the decompiler may fail to produce equivalent C code and has to fall back to _asm statements.

Analyzing system code

As an example, let’s have a look at this function from a PowerPC firmware.

ROM:00000C8C sub_C8C:                                # CODE XREF: ROM:00000B1C↑p
ROM:00000C8C                                         # sub_CF0+44↓p ...
ROM:00000C8C
ROM:00000C8C .set back_chain, -0x18
ROM:00000C8C .set var_C, -0xC
ROM:00000C8C .set sender_lr,  4
ROM:00000C8C
ROM:00000C8C     stwu      r1, back_chain(r1)
ROM:00000C90     mflr      r0
ROM:00000C94     stmw      r29, 0x18+var_C(r1)
ROM:00000C98     stw       r0, 0x18+sender_lr(r1)
ROM:00000C9C     addi      r31, r3, 0
ROM:00000CA0     mflr      r3
ROM:00000CA4     addi      r30, r3, 0
ROM:00000CA8     bl        sub_1264
ROM:00000CAC     lis       r29, 0x40 # '@'
ROM:00000CB0     lhz       r29, -0x2C(r29)
ROM:00000CB4     mtsprg0   r29
ROM:00000CB8     not       r11, r31
ROM:00000CBC     slwi      r11, r11, 16
ROM:00000CC0     or        r31, r11, r31
ROM:00000CC4     mtsprg1   r31
ROM:00000CC8     mtsprg2   r30
ROM:00000CCC     mftb      r3
ROM:00000CD0     addi      r30, r3, 0
ROM:00000CD4     mtsprg3   r30
ROM:00000CD8     bl        sub_1114
ROM:00000CD8 # End of function sub_C8C

The code seems to be using Special Purpose Register General (sprg0/1/2/3) for its own purposes, probably to store some information for exception processing. Because system instructions are generally not encountered in user-mode code, they are not supported by the decompiler out-of-box and the default output looks like this:

void __fastcall __noreturn sub_C8C(int a1)
{
  int v1; // lr

  _R30 = v1;
  sub_1264();
  _R29 = (unsigned __int16)word_3FFFD4;
  __asm { mtsprg0   r29 }
  _R31 = (~a1 << 16) | a1;
  __asm
  {
    mtsprg1   r31
    mtsprg2   r30
    mftb      r3
  }
  _R30 = _R3;
  __asm { mtsprg3   r30 }
  sub_1114();
}

Although the instructions themselves are shown as _asm statements, the decompiler could detect the registers used by them and created pseudo variables (_R29, _R30, _R31) to represent the operations performed. However, it is possible to get rid of _asm blocks with a bit of manual work.

Decompile as call

It is possible to tell the decompiler that specific instructions should be treated as if they were function calls. You can even use a custom calling convention to specify the exact input/output registers of the pseudo function. Let’s try it for the unhandled instructions.

  1. In the disassembly view, place the cursor on the instruction (e.g. mtsprg0 r29);
  2. Invoke Edit > Other > Decompile as call…
  3. Enter the prototype, taking into account input/output registers. In our example we’ll use:
    void __usercall mtsgpr0(unsigned int value<r29>);
  4. Repeat for the remaining instructions, for example:
    void __usercall mtsgpr1(unsigned int<r31>);
    void __usercall mtsgpr2(unsigned int<r30>);
    void __usercall mtsgpr3(unsigned int<r30>)
    int __usercall mftb<r3>();
  5. Refresh the decompilation if it’s not done automatically.

We get something like this:

void __fastcall __noreturn sub_C8C(int a1)
{
  unsigned int v1; // lr

  sub_1264();
  mtsgpr0((unsigned __int16)word_3FFFD4);
  mtsgpr1((~a1 << 16) | a1);
  mtsgpr2(v1);
  mtsgpr3(mftb());
  sub_1114();
}

No more _asm blocks! The only remaining wrinkle is the mysterious variable v1 which is marked in orange (“value may be undefined”).

if we look at the assembly, we’ll see that the  r30 passed to mtsprg2 originates  from r3 set by the mflr r3 instruction. The instruction reads value of the lr (link register), which contains the return address to the caller and thus by definition has no determined value. However, we can use a pseudo function such as GCC’s __builtin_return_addressby specifying this prototype for the mflr r3 instruction:
void * __builtin_return_address ();

NB: We do not need to use __usercall here because r3 is already the default location for a return value in the PPC ABI.

Finally, the decompilation is looking nice and tidy:

Complex situations

If you want to automate the process of applying prototypes to instructions, you can use a decompiler plugin or script. For example, see the vds8 decompiler SDK sample (also shipped with IDA), which handles some of the SVC calls in ARM code. In even more complicated cases, such as when some arguments can’t be represented by custom calling convention, or the semantics are better represented by something other than a function call (e.g. the instruction affects multiple registers), you can use a “microcode filter” to generate custom microcode which would then be optimized and converted to C code by the decompiler engine. A great example is the excellent microAVX plugin by Markus Gaasedelen.

See also: Decompile as call in the decompiler manual.