Although the Hex-Rays decompiler was originally written to deal with compiler-generated code, it can still do a decent job with manually written assembly. However, such code may use non-standard instructions or use them in non-standard ways, in which case the decompiler may fail to produce equivalent C code and has to fall back to _asm
statements.
Analyzing system code
As an example, let’s have a look at this function from a PowerPC firmware.
ROM:00000C8C sub_C8C: # CODE XREF: ROM:00000B1C↑p ROM:00000C8C # sub_CF0+44↓p ... ROM:00000C8C ROM:00000C8C .set back_chain, -0x18 ROM:00000C8C .set var_C, -0xC ROM:00000C8C .set sender_lr, 4 ROM:00000C8C ROM:00000C8C stwu r1, back_chain(r1) ROM:00000C90 mflr r0 ROM:00000C94 stmw r29, 0x18+var_C(r1) ROM:00000C98 stw r0, 0x18+sender_lr(r1) ROM:00000C9C addi r31, r3, 0 ROM:00000CA0 mflr r3 ROM:00000CA4 addi r30, r3, 0 ROM:00000CA8 bl sub_1264 ROM:00000CAC lis r29, 0x40 # '@' ROM:00000CB0 lhz r29, -0x2C(r29) ROM:00000CB4 mtsprg0 r29 ROM:00000CB8 not r11, r31 ROM:00000CBC slwi r11, r11, 16 ROM:00000CC0 or r31, r11, r31 ROM:00000CC4 mtsprg1 r31 ROM:00000CC8 mtsprg2 r30 ROM:00000CCC mftb r3 ROM:00000CD0 addi r30, r3, 0 ROM:00000CD4 mtsprg3 r30 ROM:00000CD8 bl sub_1114 ROM:00000CD8 # End of function sub_C8C
The code seems to be using Special Purpose Register General (sprg0
/1/2/3) for its own purposes, probably to store some information for exception processing. Because system instructions are generally not encountered in user-mode code, they are not supported by the decompiler out-of-box and the default output looks like this:
void __fastcall __noreturn sub_C8C(int a1) { int v1; // lr _R30 = v1; sub_1264(); _R29 = (unsigned __int16)word_3FFFD4; __asm { mtsprg0 r29 } _R31 = (~a1 << 16) | a1; __asm { mtsprg1 r31 mtsprg2 r30 mftb r3 } _R30 = _R3; __asm { mtsprg3 r30 } sub_1114(); }
Although the instructions themselves are shown as _asm
statements, the decompiler could detect the registers used by them and created pseudo variables (_R29
, _R30
, _R31
) to represent the operations performed. However, it is possible to get rid of _asm
blocks with a bit of manual work.
Decompile as call
It is possible to tell the decompiler that specific instructions should be treated as if they were function calls. You can even use a custom calling convention to specify the exact input/output registers of the pseudo function. Let’s try it for the unhandled instructions.
- In the disassembly view, place the cursor on the instruction (e.g.
mtsprg0 r29
); - Invoke Edit > Other > Decompile as call…
- Enter the prototype, taking into account input/output registers. In our example we’ll use:
void __usercall mtsgpr0(unsigned int value<r29>);
- Repeat for the remaining instructions, for example:
void __usercall mtsgpr1(unsigned int<r31>);
void __usercall mtsgpr2(unsigned int<r30>);
void __usercall mtsgpr3(unsigned int<r30>)
int __usercall mftb<r3>(); - Refresh the decompilation if it’s not done automatically.
We get something like this:
void __fastcall __noreturn sub_C8C(int a1) { unsigned int v1; // lr sub_1264(); mtsgpr0((unsigned __int16)word_3FFFD4); mtsgpr1((~a1 << 16) | a1); mtsgpr2(v1); mtsgpr3(mftb()); sub_1114(); }
No more _asm
blocks! The only remaining wrinkle is the mysterious variable v1 which is marked in orange (“value may be undefined”).
if we look at the assembly, we’ll see that the r30
passed to mtsprg2
originates from r3
set by the mflr r3
instruction. The instruction reads value of the lr
(link register), which contains the return address to the caller and thus by definition has no determined value. However, we can use a pseudo function such as GCC’s __builtin_return_address
by specifying this prototype for the mflr r3
instruction:
void * __builtin_return_address ();
NB: We do not need to use __usercall
here because r3
is already the default location for a return value in the PPC ABI.
Finally, the decompilation is looking nice and tidy:
Complex situations
If you want to automate the process of applying prototypes to instructions, you can use a decompiler plugin or script. For example, see the vds8 decompiler SDK sample (also shipped with IDA), which handles some of the SVC
calls in ARM code. In even more complicated cases, such as when some arguments can’t be represented by custom calling convention, or the semantics are better represented by something other than a function call (e.g. the instruction affects multiple registers), you can use a “microcode filter” to generate custom microcode which would then be optimized and converted to C code by the decompiler engine. A great example is the excellent microAVX plugin by Markus Gaasedelen.
See also: Decompile as call in the decompiler manual.