Igor’s tip of the week #67: Decompiler helpers

We’ve already described custom types used in the decompiled code, but you may also encounter some unusual keywords resembling function calls. They are used by the decompiler to represent operations which it was unable to map to nice C code, or just to make the output more compact. They are listed in the defs.h header file that is provided with the decompiler (can be found in plugins/hexrays_sdk/include in your IDA directory) but here is a high level overview of the commonly seen ones.

Partial access macros

Sometimes the code may access smaller parts of a big variable. To not pollute the code with multitudes of casts, the decompiler uses helper macros for this purpose.

LOWORD(x),LOWORD(x),LODWORD(x) return the lowest byte/word/dword of the variable x as an unsigned value;
HIWORD(x),HIWORD(x),HIDWORD(x) return the corresponding high part;
BYTE1(x), BYTE2(x) etc. return individual bytes in the memory order. The variable is considered to start at byte 0 in memory.
same macros but with the S prefix (SLOBYTE, SBYTE1 etc.) return signed values.

Note: this approach may lead to somewhat confusing situations on big endian processors like PPC. Because big-endian data is stored starting from the high byte, the low-order byte of it is stored at the highest memory address and so is accessed using the HIBYTE macro. For example, consider a 32-bit variable containing value 0x1A2B3C4D. It will be stored in memory in different order on little-endian(LE) and big-endian(BE) platforms:

 LE BE
┌──┬──┐
│4D│1A├◄───LOBYTE
├──┼──┤
│3C│2B├◄───BYTE1
├──┼──┤
│2B│3C├◄───BYTE2
├──┼──┤
│1A│4D├◄───HIBYTE
└──┴──┘

Combining values

Sometimes the compiler needs to represent the opposite operation: two values are combined to make a larger one. For this, “pair” macros are used:

__PAIR16__(high, low) creates a 16-bit value from two 8-bit ones. Unlike partial accesses macros, it does not depend on the memory order but uses simple bit shifts, so the result is the same for little- and big-endian code. For example, __PAIR16__(0x1A, 0x2B) returns in 0x1A2B in either situation;
__PAIR32__, __PAIR64__, __PAIR128__ perform the corresponding operation for bigger-sized values;
__SPAIR16__ etc. return signed values.

Bit and flag manipulations

Some assembly instructions do not have simple C representation so custom helper functions are used.

__ROLn__(value, count) and __RORn__(value, count) (n=1,2,4,8) represent n-byte left and right bit rotates;
__OFADD__ and __OFSUB__ return the overflow flag of addition(subtraction) operation on two values.
__CFADD__ and __CFSUB__ perform the same for carry flag.
__SETP__(x, y) is used to represent the parity flag generated by expression x-y.

Overflow-checking multiplications

Recent compilers started adding overflow checks in common situations. For example, when calling operator new[], behind the scenes the compiler has to multiply the size of the elements by their count. If this operation overflows, wrong value may be produced, leading to under-allocation or allocation failure. Programmers may also add manual overflow checks. The following helper functions are used to represent such code patterns:

is_mul_ok(count, elsize) represents overflow check on the result of count*elsize. It is presumed to return true if the overflow does not happen.
saturated_mul(count, elsize) returns either the result of multiplication if it can be calculated safely, or the maximum unsigned integer value of the corresponding size (e.g. 0xFFFFFFFF). The latter should ensure that the allocation fails in case of overflow. This pattern is commonly used in calls to operator new[] in recent versions of Visual C++.

Value coercion

Sometimes the code treats the same underlying value as different types. For example, the famous inverse square root function from Quake treats a 32-bit floats as an integers and vice versa:

float InvSqrt (float x){
    float xhalf = 0.5f*x;
    int i = *(int*)&x;
    i = 0x5f3759df - (i>>1);
    x = *(float*)&i;
    x = x*(1.5f - xhalf*x*x);
    return x;
}

Although in the source code this conversion is represented using casts and dereferences, in the optimized code they may be replaced by simple moves between registers, especially when using SSE or AVX instructions which use the same registers to store both floating-point and integer values. Thus the decompiler has to use special macros to represent such code:

COERCE_FLOAT(v), COERCE_DOUBLE(v), COERCE_LONG_DOUBLE(v) are used to treat the bit pattern of v as the corresponding floating-point type.
COERCE_UNSIGNED_INT(v) and COERCE_UNSIGNED_INT64(v) are used for the opposite conversions.
You may also see SLODWORD when a floating-point value is treated as a signed integer.

For example, here’s how pseudocode for the above function looks like when decompiled:

double __cdecl InvSqrt(float a1)
{
  float v2; // [esp+0h] [ebp-8h]

  v2 = a1 * 0.5;
  return (float)((1.5
                - v2 * COERCE_FLOAT(0x5F3759DF - (SLODWORD(a1) >> 1)) * COERCE_FLOAT(0x5F3759DF - (SLODWORD(a1) >> 1)))
               * COERCE_FLOAT(0x5F3759DF - (SLODWORD(a1) >> 1)));
}

Igor’s tip of the week #66: Decompiler annotations Igor’s tip of the week #68: Skippable instructions