The decompiler makes some assumptions about the input code. Like that call instructions usually return, the memory model is flat, the function frame is set properly, etc. When these assumptions are correct, the output is good. When they are wrong, well, the output does not correspond to the input. Take, for example, the following snippet:
The decompiler produces the following pseudocode:
Apparently, the v3 variable (it corresponds to edx) is not initialized at all. Why?
This happens because called functions usually spoil some registers. The calling conventions on x86 stipulate that only the esi, edi, ebx, and ebp registers are saved across calls. In other words, other registers may change their values (or be spoiled) by a function call. Since the decompiler assumes that functions obey the regular calling conventions, it separates edx before the call and after the call into two variables. The first variable gets optimized away and is replaced by a1. The second variable (v3) becomes uninitialized.
In fact, there are three possible cases. The edx register could be:
- unmodified
- used to return a value
- spoiled
by the called function. The decompiler chose the default case (#3). Let’s check if it was right. Here’s the disassembly of sub_2A795:
As we see, the edx register is not referenced at all, so we have the case #1. If the decompiler could find it out itself, without our help, our life would be much easier (maybe it will do so in the future!) Meanwhile, we have to add the required information ourselves. We do it using the Edit, Functions, Set function type command in IDA. The callee does not spoil any registers:
The decompiler produces different pseudocode:
Since it knows that edx is not modified by the call, it creates just one variable for both edx instances (before and after the call).
Were the called function returning its value in edx (the case #2), we would set its type like this:
(this prototype means: function with one argument on the stack, the argument will be popped by the callee; the result is returned in edx)
The decompiler would create two separate variables for edx, as in the case #3. The first one would be optimized away, but the second one would be initialized with the returned value:
As you see, the type information plays very important role in decompilation. In order to get a correct output, a correct input (or assumptions) must be given. Otherwise the decompiler works in the “garbage in – garbage out” mode.
Always pay attention to the types, it is a good thing to do.