We’ve already described custom types used in the decompiled code, but you may also encounter some unusual keywords resembling function calls. They are used by the decompiler to represent operations which it was unable to map to nice C code, or just to make the output more compact. They are listed in the defs.h
header file that is provided with the decompiler (can be found in plugins/hexrays_sdk/include
in your IDA directory) but here is a high level overview of the commonly seen ones.
Partial access macros
Sometimes the code may access smaller parts of a big variable. To not pollute the code with multitudes of casts, the decompiler uses helper macros for this purpose.
LOWORD(x),LOWORD(x),LODWORD(x)
return the lowest byte/word/dword of the variablex
as an unsigned value;HIWORD(x),HIWORD(x),HIDWORD(x)
return the corresponding high part;BYTE1(x), BYTE2(x)
etc. return individual bytes in the memory order. The variable is considered to start at byte 0 in memory.- same macros but with the S prefix (
SLOBYTE
,SBYTE1
etc.) return signed values.
Note: this approach may lead to somewhat confusing situations on big endian processors like PPC. Because big-endian data is stored starting from the high byte, the low-order byte of it is stored at the highest memory address and so is accessed using the HIBYTE
macro. For example, consider a 32-bit variable containing value 0x1A2B3C4D. It will be stored in memory in different order on little-endian(LE) and big-endian(BE) platforms:
LE BE ┌──┬──┐ │4D│1A├◄───LOBYTE ├──┼──┤ │3C│2B├◄───BYTE1 ├──┼──┤ │2B│3C├◄───BYTE2 ├──┼──┤ │1A│4D├◄───HIBYTE └──┴──┘
Combining values
Sometimes the compiler needs to represent the opposite operation: two values are combined to make a larger one. For this, “pair” macros are used:
__PAIR16__(high, low)
creates a 16-bit value from two 8-bit ones. Unlike partial accesses macros, it does not depend on the memory order but uses simple bit shifts, so the result is the same for little- and big-endian code. For example,__PAIR16__(0x1A, 0x2B)
returns in0x1A2B
in either situation;__PAIR32__
,__PAIR64__
,__PAIR128__
perform the corresponding operation for bigger-sized values;__SPAIR16__
etc. return signed values.
Bit and flag manipulations
Some assembly instructions do not have simple C representation so custom helper functions are used.
__ROLn__(value, count)
and__RORn__(value, count)
(n=1,2,4,8) represent n-byte left and right bit rotates;__OFADD__
and__OFSUB__
return the overflow flag of addition(subtraction) operation on two values.__CFADD__
and__CFSUB__
perform the same for carry flag.__SETP__(x, y)
is used to represent the parity flag generated by expression x-y.
Overflow-checking multiplications
Recent compilers started adding overflow checks in common situations. For example, when calling operator new[]
, behind the scenes the compiler has to multiply the size of the elements by their count. If this operation overflows, wrong value may be produced, leading to under-allocation or allocation failure. Programmers may also add manual overflow checks. The following helper functions are used to represent such code patterns:
is_mul_ok(count, elsize)
represents overflow check on the result ofcount*elsize
. It is presumed to return true if the overflow does not happen.saturated_mul(count, elsize)
returns either the result of multiplication if it can be calculated safely, or the maximum unsigned integer value of the corresponding size (e.g.0xFFFFFFFF
). The latter should ensure that the allocation fails in case of overflow. This pattern is commonly used in calls tooperator new[]
in recent versions of Visual C++.
Value coercion
Sometimes the code treats the same underlying value as different types. For example, the famous inverse square root function from Quake treats a 32-bit floats as an integers and vice versa:
float InvSqrt (float x){ float xhalf = 0.5f*x; int i = *(int*)&x; i = 0x5f3759df - (i>>1); x = *(float*)&i; x = x*(1.5f - xhalf*x*x); return x; }
Although in the source code this conversion is represented using casts and dereferences, in the optimized code they may be replaced by simple moves between registers, especially when using SSE or AVX instructions which use the same registers to store both floating-point and integer values. Thus the decompiler has to use special macros to represent such code:
COERCE_FLOAT(v)
,COERCE_DOUBLE(v)
,COERCE_LONG_DOUBLE(v)
are used to treat the bit pattern ofv
as the corresponding floating-point type.COERCE_UNSIGNED_INT(v)
andCOERCE_UNSIGNED_INT64(v)
are used for the opposite conversions.- You may also see
SLODWORD
when a floating-point value is treated as a signed integer.
For example, here’s how pseudocode for the above function looks like when decompiled:
double __cdecl InvSqrt(float a1) { float v2; // [esp+0h] [ebp-8h] v2 = a1 * 0.5; return (float)((1.5 - v2 * COERCE_FLOAT(0x5F3759DF - (SLODWORD(a1) >> 1)) * COERCE_FLOAT(0x5F3759DF - (SLODWORD(a1) >> 1))) * COERCE_FLOAT(0x5F3759DF - (SLODWORD(a1) >> 1))); }