In C, union is a type similar to a struct but in which all members (possibly of different types) occupy the same memory, overlapping each other. They are used, for example, when there is a need to interpret the same data in different ways, or to save memory when storing data of different types (this is common in scripting engines, among others). IDA and the decompiler fully support unions and include definitions of commonly used ones in the standard type libraries, so they may be already present in the analyzed binaries.
Assembly-level unions can be created in the Structures window by enabling “create union” checkbox when adding a new “structure”.
You can also use the Local Types editor to create a union using C syntax.
Using unions in disassembly
In disassembly, unions can be used similarly to structures. For example, when a member is referenced as an offset from a register, you can use the context menu’s “Structure offset” submenu or the T hotkey. The difference is that you may see multiple “paths” for the same offset, representing alternative union members, so you can pick one most suitable for the specific use case.
Example: OLE automation
OLE Automation is a COM-based set of APIs commonly used to implement scripting in Microsoft and other applications. One of the basic types used in it is the
VARIANTARG structure, which can contain different types of values by embedding a union of different typed fields inside it.
For example, if we have an instruction
mov eax, [edx+8] and we know that
edx points to an instance of
VARIANTARG, using T on the second operand shows us multiple versions of the union field, so we can pick the one most relevant to the specific code path taken.
Changing the union field used
After you (or IDA) selected a union field, you can change it by going through the struct selection again (e.g. the T hotkey). But if the parent structure should remain the same, you can change only the union member by using the command Edit > Structs > Select union member… (hotkey Alt–Y). This can be especially useful when a structure with embedded union is placed on the stack, because you can’t use the normal structure offset commands there (the offset inside the instruction is based on the stack or frame pointer which does not point to the beginning of the structure).
Unions in decompiler
Because the decompiler can do dataflow analysis, in many cases it can pick up the most suitable union field by matching the expected type of the variable used by the code. For example, in the snippet below the decompiler picked the correct field for the argument passed to
SysAllocString, because it knows that the function expects an argument of type
const OLECHAR * , which is compatible with the
BSTR bstrVal field of the union.
However, for the other reference the
iVal filed was selected. While it is compatible for the use case (comparing against zero), by looking at the code it’s obvious that the code is interpreting a boolean variant value (this can be made more clear by replacing the number 11 by the symbolic constant
VT_BOOL). This means that
boolVal is a more logical choice, and we can pick it by using “Select union field…” from the context menu, or the same Alt–Y hotkey as for disassembly.