Previously, we have covered offset expressions which fit into a single instruction operand or data value. But this is not always the case, so let’s see how IDA can handle offsets which may be built out of multiple parts.
Although slowly dying out, the 8-bit processors — especially the venerable 8051 — can still appear in current hardware, and of course we’ll be dealing with legacy systems for many years to come. Even though their registers can store only 8 bits af data, most of them can address 16-bit (64KiB) or more of memory which means that the addresses may need to be built by parts.
For example, consider this sequence of instructions from an 8051 firmware:
code:CF22 mov R3, #0xFF code:CF24 mov R2, #0xF6 code:CF26 mov R1, #0xA6 code:CF28 sjmp code_CF36
The code for 8051 is often compiled using Keil C51 compiler, and this pattern is a typical way of initializing a generic pointer to code memory. The address being referenced is
0xF6A6, but can we make the instructions look “nice” and create cross references to it?
One possibility is to use offset with custom base on the last move and specify the base of
This does calculate the final address and create a cross-reference but the code is not quite “nice looking” and the other instruction remains a plain number:
In fact, a better option is to use the high8/low8 offsets for the two instructions. Because each instruction provides only a part of the full offset, it alone cannot be used by IDA for calculating the full address which needs to be provided by the user.
R2 provides the top 8 bits of the address, so we should use the
HIGH8 offset type for it. We also need to fill in the full address (
0xF6A6) in the Target address field. Base address should be reset to 0.
LOW8 and the same target can be used:
After applying both offsets, IDA displays them using matching assembler operators:
RISC processors often use fixed-width instructions and may not be able to reach the full range of the address space with the limited space for the immediate operand in the instruction. This include SPARC, MIPS, PowerPC and some others. As an example, let’s look at this PowerPC VLE snippet:
seg001:0000C156 e_lis r3, 1 # Load Immediate Shifted seg001:0000C15A e_add16i r3, r3, -0x1650 # 0xE9B0 seg001:0000C15E se_mtlr r3 seg001:0000C160 se_blrl
The code calculates an address of a function in
r3 and then calls it. IDA helpfully shows the final address in a comment, but we can also use custom offsets to represent them nicely. For the
e_add16i instruction, we can use the
LOW16 type, as expected, but in case of
e_lis, the processor-specific type
HIGHA16 should be used instead of
HIGH16. This is because the low 16 bits are used here not as-is but as a sign-extened addend, with the high 16 bits of the final address becoming 0 after the addition (0x10000-0x1650=0xE9B0).
After converting both parts, IDA uses special assembler operators to show the final address:
Now we can go to the target and create a function there.
Note: specifically for PowerPC, IDA will automatically convert such sequences to offset expression if the target address exists and has instructions or data. But the manual approach can still be useful for other processors or complex situations (for example, the two instructions are too far apart).