Comparisons of ARM disassembly and decompilation
Here are some side-by-side comparisons of disassembly and decompiler for ARM. Please maximize the window too see both columns simultaneously.
The following examples are displayed on this page:
Simple case
Let's start with a very simple function. It accepts a pointer to a structure and zeroes out its first three fields. While the function logic is obvious by just looking at the decompiler output, the assembly listing has too much noise and requires studying it.
The decompiler saves your time and allows you to concentrate on more exciting aspects of reverse engineering.64-bit arithmetics
Sorry for a long code snippet, ARM code tends to be longer compared to x86 code. This makes our comparison even more impressive: look at how concise is the decompiler output!
Conditional instructions
The ARM processor has conditional instructions that can shorten the code but
require high attention from the reader. The case above is very simple, just note that
there is a pair of instructions: MOVNE
and LDREQSH
. Only one of them will
be executed at once. This is how simple if-then-else
looks in ARM.
A quiz question: did you notice that MOVNE
loads zero to R0
? (because I didn't:)
Also note that in the disassembly listing we see var_8
but the location really used
is var_A
, which corresponds to v4
.
Conditional instructions - 2
Look, the decompiler output is longer! This is a rare case when the pseudocode
is longer than the disassembly listing, but it is a for a good cause: to keep
it readable. There are so many conditional instructions here, it is very easy
to misunderstand the dependencies. For example, did you notice that the first MOVEQ
may use the condition codes set by CMP
? The subtle detail is that CMP
may reach MOVEQ
s.
Complex instructions
Conditional instructions are just part of the story. ARM is also famous for having a plethora
of data movement instructions. They come with a set of possible suffixes that subtly change
the meaning of the instruction. Take STMCSIA
, for example. It is a STM
instruction, but then you have to remember that CS
means "carry set" and
IA
means "increment after".
In short, the disassembly listing is like Chinese. The pseudocode is longer but requires much less time to understand.
Compiler helper functions
Sorry for another long code snippet. Just wanted to show you that the decompiler can
handle compiler helper functions (like __divdi3
) and handles 64-bit arithmetic
quite well.
Immediate constants
Since ARM instructions cannot have big immediate constants, sometimes they
are loaded with two instructions. There are many 0xFA
(250 decimal) constants
in the disassembly listing, but all of them are shifted to the left by 2 before
use. The decompiler saves you from these petty details.
Also a side: the decompiler can handle ARM mode as well as Thumb mode instructions. It just does not care about the instruction encoding because it is already handled by IDA.
Position independent code
In some case the disassembly listing can be misleading, especially with PIC (position independent code).
While the address of a constant string is loaded into R12
, the code does not
care about it. It is just how variable addresses are calculated in PIC-code (it is .got-someoffset).
Such calculations are very frequent in shared objects and unfortunately IDA cannot
handle all of them. But the decompiler did a great job of tracing R12
.