Sometimes you may see mysterious align
keywords in the disassembly, which can appear both in code and data areas:
Usually they’re only apparent in the text view.
These directives are used by many assemblers to indicate alignment to a specific address boundary, usually a power of two. IDA uses it to replace potentially irrelevant bytes by a short one-liner, both for more compact listing and to indicate that this part of the binary is probably not interesting.
Depending on the processor and the assembler chosen, different keyword can be used (e.g. align
or .align
), and the number after the directive can mean either the number of bytes or the power of two (i.e. 1 means aligning to two bytes, 2 to four, 4 to sixteen and so on).
The alignment items can appear in the following situations:
Code alignment padding
Many processors use instruction caches which speed up execution of often-executed code (for example, loops). This is why it may be useful to ensure that start of a loop is aligned on a specific address boundary (usually 16 bytes). For this, the compiler needs to insert instructions which do not affect the behavior of the function, i.e. NOP (no-operation) instructions. Which specific instructions are used depends on the processor and compiler.
For example, here GCC used a so-called “long NOP” to align the loop on 16 bytes (obvious thanks to the hexadecimal address ending with 0). Because this instruction is actually executed, IDA shows it as code and not as an align expression (which is considered non-executable and would break disassembly), but you can still convert it manually.
There may also be hardware requirements. On some processors the interrupt handlers must be aligned, like in this example from PowerPC:
Here, 4 is a power-of-two value, i.e. alignment to 16-byte boundary.
Function padding
Similarly to loops, whole functions can benefit from the alignment, so they’re commonly (but not always!) aligned to at least four bytes. Because the functions are usually placed one after the other but the function size is not always a multiple of the alignment, extra padding has to be inserted by the compiler and/or the linker. Two common approaches are used:
- executable NOP instructions, just like for the loop alignment. This is the approach commonly used by GCC and derived compilers:
- invalid or trapping instructions. This can be useful to catch bugs where execution is diverted to an address between functions, for example due to a bug or an exploit. Microsoft Visual C++, for example, tends to use 0xCC (breakpoint instruction) to pad the space between functions on x86:
Data alignment padding
Many processors have alignment requirements: some can’t even load data from unaligned addresses, and others can usually fetch aligned data faster. So the compilers often try to ensure that data items are placed on an aligned address boundary (usually at least 4 bytes). Most commonly, zero-fill padding is used:
Although NOP-like fillers may be used by some compilers too, especially for constant data placed in executable areas:
Converting alignment items
While rare, it may be necessary for you to change IDA’s decision concerning an alignment item. Because they’re mostly equivalent to data items, you can use the usual shortcut U to undefine them (convert to plain bytes), and then C to convert to code (in case they correspond to valid instructions).
To go the other way (convert instructions or undefined bytes) to an alignment item, use Edit > Other > Create alignment directive…, or just the shortcut L. IDA will check at what address is the next defined instruction or data item and will offer possibly several alignment options depending on its address. For example, in this situation:
The current address is divisible by 4 so any alignment less than 4 is not applicable. The following defined address ( 7FF674A1A20
) is divisible by 32, so IDA offers options 8, 16 and 32. Note that if you choose 8, the alignment item will only cover the first 4 bytes (up to 7FF674A1A18
), so in this situation 16 or 32 makes the most sense.