 State-of-the-art binary code analysis tools

# Hex-Rays v1.3 vs. v1.2 Decompiler Comparison Page

Below you will find side-by-side comparisons of v1.2 and v1.3 decompilations. Please maximize the window too see both columns simultaneously.

NOTE: these are just some selected examples that can be illustrated as a side-by-side difference. Hex-Rays Decompiler v1.3 includes are many other improvements and new features that are not mentioned on this page - simply because there was nothing to compare them with. Also, some improvements have already been illustrated in the previous comparisons. Please refer to the news page for more details.

## Better 64-bit arithmetics

It seems that 64-bit support is a never ending story. The previous version of the decompiler could not recognize the 64-bit addition because it was interleaved with other operations and the value of an operand changed midway. The new version can handle it, and the output is much simpler.

Psedudocode v1.2
v3 = (_DWORD)v28 >= (unsigned int)-v27; v4 = v28 + v27; v27 = 0xB23199F3u; LODWORD(v25) = v4; HIDWORD(v25) = HIDWORD(v28) + v3; if ( !((HIDWORD(v28) + v3) | v4) )
Pseudocode v1.3
v3 = v27; v27 = 0xB23199F3u; v25 = v28 + (unsigned int)v3; if ( !v25 )

## Better 64-bit arithmetics - 2

An unrecognized 64-bit addition may lead to other complications. On the left, we have `v9` and `v10` 32-bit variables, on the right there is one simple 64-bit `v10` variable. Note the difference.

Psedudocode v1.2
v9 = a2 + v29; v8 = (int)&v5[(_DWORD)a2 >= (unsigned int)-v29]; if ( ReadPtr(a2 + v29, &v5[(_DWORD)a2 >= (unsigned int)-v29], &v39) ) ExtensionApis.lpOutputRoutine( "Cannot read DebugInfo adddress at 0x%p.\n", v9, v8);
Pseudocode v1.3

## 64-bit comparisons

We added more rules to recognize 64-bit comparisons. The results are pleasing.

Psedudocode v1.2
if ( (_DWORD)xll != (_DWORD)yll || HIDWORD(xll) != HIDWORD(yll) ) result = 2; else result = 1;
Pseudocode v1.3
if ( xll == yll ) result = 1; else result = 2;

## Nested pointer, array, and structure references

Complex references like pointers to arrays to pointers to (the list may go on) were not always recognized and represented nicely. Only one level of indirection was handled nicely, deeper references might look ugly. Now the decompiler does a much better job. (the type of `a3g` is `char (**a3g)`, so the expression on the left is correct too)

Psedudocode v1.2
char __cdecl fa3g(int i, int j, int k) { return *(&(*a3g[i])[5 * j] + k); }
Pseudocode v1.3
char __cdecl fa3g(int i, int j, int k) { return a3g[i][j][k]; }

## Assignments and comma operators

Nobody likes comma operators but the decompiler has to revert to them to get rid of `goto`s. In some cases they can still be eliminated and that's what the new version does.

Psedudocode v1.2
if ( !ptr || (v3 = *ptr, !*ptr) ) do_something...;
Pseudocode v1.3
if ( !ptr || (v3 = *ptr) == 0 ) do_something...;

## Global propagation of calculated values

Note that the decompiler replaced the `result` variable with its known value, zero. Knowing a variable value enables many other optimizations and can simplify the output very much. It also removes false dependencies: for example, the previous version had to introduce a cast to `LPCSTR`.

Psedudocode v1.2
if ( !result ) { if ( !v5 ) return result; v4 = SysAllocStringByteLen((LPCSTR)result, v5 - 2);
Pseudocode v1.3
if ( !result ) { if ( !v5 ) return 0; v4 = SysAllocStringByteLen(0, v5 - 2);

## Calculated values - 2

Since we know that value of `v1` in the `if`-branch, we can replace it with zero, which leads to simplifications. The output is much cleaner.

Psedudocode v1.2
v1 = RegEnumValueW(hKey, 0, &ValueName, &cbValueName, 0, 0, 0, 0); if ( !v1 && ValueName ) return v1 + 1;
Pseudocode v1.3
if ( !RegEnumValueW(hKey, 0, &ValueName, &cbValueName, 0, 0, 0, 0) && ValueName ) return 1;

## Improved register argument detection

The heurstics to detect register arguments has been improved. The output does not require any comments.

Psedudocode v1.2
result = ((int (__thiscall *)(void *, IStream *, void *, signed int, _DWORD))pStm->lpVtbl->Write)( pvarSrc, pStm, pvarSrc, 2, 0);
Pseudocode v1.3
result = pStm->lpVtbl->Write(pStm, pvarSrc, 2u, 0);

## Simpler arithmeric operations

Psedudocode v1.2
bytevar = ai[arg0] + LOBYTE(ai[arg0]) + 2 * LOBYTE(uai[arg0]);
Pseudocode v1.3
bytevar = ai[arg0] + ai[arg0] + 2 * uai[arg0];

## References to arrays of structures

First, references to arrays of structures are much better. Second, the decompiler could determine that v20 is used only to access the array and divided it by the array element size (12).

Psedudocode v1.2
do { if ( v29 & *(int *)((char *)&GlobalFlagInfo.flags + v20) ) { v23 = *(int *)((char *)&GlobalFlagInfo.cmd + v20); if ( v23 ) ExtensionApis.lpOutputRoutine(" %s - %s\n", v23, *(void **)((char *)&GlobalFlagInfo.desc + v20)); } v20 += 12; } while ( v20 < 0x180 );
Pseudocode v1.3
do { if ( v27 & GlobalFlagInfo[v20].flags ) { v21 = GlobalFlagInfo[v20].cmd; if ( v21 ) ExtensionApis.lpOutputRoutine(" %s - %s\n", v21, GlobalFlagInfo[v20].desc); } ++v20; } while ( v20 < 32 );

## Improved optimizer

It is difficult to say what exactly improvement of the decompiler led to this result, but we like it anyway. The decompiler could get rid of intermediate variables and simplify the code to the maximum.

Psedudocode v1.2
while ( 1 ) { v7 = *v3; if ( !*v3 ) break; *v6++ = v7; ++v3; } *v6 = v7;
Pseudocode v1.3
while ( *v2 ) *v5++ = *v2++; *v5 = 0;

## Improved optimizer - 2

Yet another example of improved output. There are many other improved things, like inlined `strcpy`, `strlen` and other functions, we are just getting too many examples anyway...

Psedudocode v1.2
return (char)(a1 - -107 * (unsigned __int16)(a1 / 661));
Pseudocode v1.3
return (char)(a1 % 661);

## Fast structural analysis

We tweaked the structural analysis: now it is faster (especially on big functions) and produces more concise output. Note that there is only one `if` operator now. Since not everyone likes dense code, this is configurable.

Psedudocode v1.2
if ( dword_8066AEC ) { if ( filename ) sub_805F18C(16, (int)"using configuration file %s", (char)filename); }
Pseudocode v1.3
if ( dword_8066AEC && filename ) sub_805F18C(16, (int)"using configuration file %s", (char)filename);

## Floating point constants

Floating point constants are detected even if they are moved around using integer manipulation commands (a simple `mov` instruction).

Psedudocode v1.2
LODWORD(v63) = 1065353216; LODWORD(v62) = 1065353216;
Pseudocode v1.3
v63 = 1.0; v62 = 1.0;

## More precise variable creation

The decompiler determined that even if `v5` and `v6` variables are initialized as 32-bit entities, only 16-bits are used. It declared them as 16-bit variables. This leads to better output.

Psedudocode v1.2
int __cdecl sub_8049CA0(struct_a1 *a1, __int16 a2, __int16 a3, int a4) { int v4; // int v5; // int v6; // = a4; a1->dword0 = a2 | (a1->dword0 << a3); if ( a3 ) v5 = a1->word4 + a3; else LOWORD(v5) = 0; a1->word4 = v5; if ( (_WORD)v5 >= 0 ) { do { *(_BYTE *)v4++ = a1->dword0 >> a1->word4; v6 = a1->word4 - 8; a1->word4 = v6; } while ( (_WORD)v6 >= 0 ); } return v4; }
Pseudocode v1.3
int __cdecl sub_8049CA0(struct_a1 *a1, __int16 a2, __int16 a3, int a4) { int v4; // __int16 v5; // __int16 v6; // = a4; a1->dword0 = a2 | (a1->dword0 << a3); if ( a3 ) v5 = a1->word4 + a3; else v5 = 0; a1->word4 = v5; if ( v5 >= 0 ) { do { *(_BYTE *)v4++ = a1->dword0 >> a1->word4; v6 = a1->word4 - 8; a1->word4 = v6; } while ( v6 >= 0 ); } return v4; }

## Postincrement/decrement with comparisons

Postincrement/decrement operators with comparisons were leading to ugly output, now it is simpler and ready to be simplified even more.

Psedudocode v1.2
v5 = a1-- == 1; if ( v5 ) break;
Pseudocode v1.3
--a1; if ( !a1 ) break;

## Constant strings

References into the middle of constact strings were not recognized, forcing the user to jump to the string to learn its value. Now the life is simpler.

Psedudocode v1.2
fwrite(&aPimGraft, 1u, 3u, _stderrp);
Pseudocode v1.3
fwrite("OK\n", 1u, 3u, _stderrp);

## Shorter output

It seems that the fast structural analysis combined with the improved loop recognition made it possible to shorten the output. Whatever the reason is, we like the output on the right.

Psedudocode v1.2
if ( a1[v11] == 9 ) { if ( v9 & 7 ) { do { ++v9; sub_804A350(v12, *(_WORD *)(a2 + 49224), v20); } while ( a1[v11] == 9 && v9 & 7 ); } }
Pseudocode v1.3
while ( a1[v11] == 9 && v8 & 7 ) { ++v8; sub_804A350(v10, *(_WORD *)(a2 + 49224), v20); }