Decompilation gets real

Analyzing binary executables can be a very boring activity, especially when you get used to the
regular patterns. You see the same things again and again. A tool to automate the analysis or
diminish the amount of text to browse quickly becomes a dream.

On the other hand, if you are new to the binary analysis, the assembly language is very unusual
at first. You need to learn not only the processor instructions but also the standard code
sequences, calling conventions, and lots of other stuff. It is understandable that you might
nostalgically remember the high level languages you know.

Here is the tool that can help you in both cases: a decompiler that can handle the real world
code. Take a binary file, analyze it, and get a nicely formatted C text as the output.
I’m tempted to say that you could even recompile it but not, recompilation is not the goal
– the program analysis is.

Let’s see how it works. Take a file, say, a virus – these days many of them are written in a high
level language. One could say that the virus writers have more comfortable environment than the
virus analysts. Hopefully the decompiler will change the situation a bit.

Go to the WinMain, check the code.

Here we call several functions, apparently the worm works
with the Internet (see WSAStartup). We could switch to the graph view to display the logic more
explicitly.

As we see, the first block checks for a condition, if it is not satisfied, we go to the end
of the function, otherwise there are some checks and some actions. We have too zoom in
and check each block to understand how the worm works.

Or we could use the decompiler to get this nice text:

The decompiler as it is today can handle compiler generated code. While it produces nice results, it lacks many features, like floating point support, exception handling, proper type derivation (and I’m sure there are hundreds of bugs are there) but eventually these things will be implemented and fixed.

The beta testing will be open soon. If you want to participate, please apply (only professional email addresses, please; keep in mind that the decompiler works only with IDA v5.1)

For more news, subscribe to the
mailing list. It is a read only list.

I’d like to say a couple of words about the decompiler internals. As anything else in reverse
engineering, the decompiler uses many probabilistic methods and heuristics. This means that
its output can not be made 100% reliable. This is just a caveat to anyone who wants a
decompiler to recover a lost source code. If you want an automatic recovery, try something else,
this decompiler won’t work for you.

It heavily uses the data-flow analysis methods to analyze the program. In fact the decompiler
consists of two parts: the first part is an engine which works with a microcode. This engine can
reason about the microcode and optimize it with the goal of making it
as concise as possible.
The second part converts microcode to a human readable form, to a C text.

The second part is quite simple: it just displays a nicely formatted text on the screen. The
first part, the optimization engine (I haven’t come up with a nice name for it yet, neither for the
decompiler), it much more interesting. It can be developed into something bigger. Something
which is capable of answering questions about the variable ranges, code and data coverage (it
will need to use inter-function analysis for that), and other things. It could check if some
invariants hold at the given program locations. A program verification tool can be built on the top
of such an engine quite easily. Imagine an automatically generated nice report about not only
trivial buffer overflows but also about other logic flaws in the program. This engine can evolve
into such a platform. This is how I see its bright and promising future 🙂