BLACK FRIDAY DISCOUNT See conditions
Hex-Rays logo State-of-the-art binary code analysis tools
email icon

Previously, we have covered offset expressions which fit into a single instruction operand or data value. But this is not always the case, so let’s see how IDA can handle offsets which may be built out of multiple parts.

8-bit processors

Although slowly dying out, the 8-bit processors — especially the venerable 8051 — can still appear in current hardware, and of course we’ll be dealing with legacy systems for many years to come. Even though their registers can store only 8 bits af data, most of them can address 16-bit (64KiB) or more of memory which means that the addresses may need to be built by parts.

For example, consider this sequence of instructions from an 8051 firmware:

code:CF22    mov     R3, #0xFF
code:CF24    mov     R2, #0xF6
code:CF26    mov     R1, #0xA6
code:CF28    sjmp    code_CF36

The code for 8051 is often compiled using Keil C51 compiler, and this pattern is a typical way of initializing a generic pointer to code memory. The address being referenced is 0xF6A6, but can we make the instructions look “nice” and create cross references to it?

One possibility is to use offset with custom base on the last move and specify the base of 0xF600:

This does calculate the final address and create a cross-reference but the code is not quite “nice looking” and the other instruction remains a plain number:

In fact, a better option is to use the high8/low8 offsets for the two instructions. Because each instruction provides only a part of the full offset, it alone cannot be used by IDA for calculating the full address which needs to be provided by the user.

R2 provides the top 8 bits of the address, so we should use the HIGH8 offset type for it. We also need to fill in the full address (0xF6A6) in the Target address field. Base address should be reset to 0.

For R1, LOW8 and the same target can be used:

After applying both offsets, IDA displays them using matching assembler operators:

RISC processors

RISC processors often use fixed-width instructions and may not be able to reach the full range of the address space with the limited space for the immediate operand in the instruction. This include SPARC, MIPS, PowerPC and some others. As an example, let’s look at this PowerPC VLE snippet:

seg001:0000C156         e_lis     r3, 1 # Load Immediate Shifted
seg001:0000C15A         e_add16i  r3, r3, -0x1650 # 0xE9B0
seg001:0000C15E         se_mtlr   r3
seg001:0000C160         se_blrl

The code calculates an address of a function in r3 and then calls it. IDA helpfully shows the final address in a comment, but we can also use custom offsets to represent them nicely. For the e_add16i instruction, we can use the LOW16 type, as expected, but in case of e_lis, the processor-specific type HIGHA16 should be used instead of HIGH16. This is because the low 16 bits are used here not as-is but as a sign-extened addend, with the high 16 bits of the final address becoming 0 after the addition (0x10000-0x1650=0xE9B0).

After converting both parts, IDA uses special assembler operators to show the final address:

Now we can go to the target and create a function there.

Note: specifically for PowerPC, IDA will automatically convert such sequences to offset expression if the target address exists and has instructions or data. But the manual approach can still be useful for other processors or complex situations (for example, the two instructions are too far apart).