Previously, we have covered offset expressions which fit into a single instruction operand or data value. But this is not always the case, so let’s see how IDA can handle offsets which may be built out of multiple parts.
8-bit processors
Although slowly dying out, the 8-bit processors — especially the venerable 8051 — can still appear in current hardware, and of course we’ll be dealing with legacy systems for many years to come. Even though their registers can store only 8 bits af data, most of them can address 16-bit (64KiB) or more of memory which means that the addresses may need to be built by parts.
For example, consider this sequence of instructions from an 8051 firmware:
code:CF22 mov R3, #0xFF code:CF24 mov R2, #0xF6 code:CF26 mov R1, #0xA6 code:CF28 sjmp code_CF36
The code for 8051 is often compiled using Keil C51 compiler, and this pattern is a typical way of initializing a generic pointer to code memory. The address being referenced is 0xF6A6
, but can we make the instructions look “nice” and create cross references to it?
One possibility is to use offset with custom base on the last move and specify the base of 0xF600
:
This does calculate the final address and create a cross-reference but the code is not quite “nice looking” and the other instruction remains a plain number:
In fact, a better option is to use the high8/low8 offsets for the two instructions. Because each instruction provides only a part of the full offset, it alone cannot be used by IDA for calculating the full address which needs to be provided by the user.
R2 provides the top 8 bits of the address, so we should use the HIGH8
offset type for it. We also need to fill in the full address (0xF6A6
) in the Target address field. Base address should be reset to 0.
For R1, LOW8
and the same target can be used:
After applying both offsets, IDA displays them using matching assembler operators:
RISC processors
RISC processors often use fixed-width instructions and may not be able to reach the full range of the address space with the limited space for the immediate operand in the instruction. This include SPARC, MIPS, PowerPC and some others. As an example, let’s look at this PowerPC VLE snippet:
seg001:0000C156 e_lis r3, 1 # Load Immediate Shifted seg001:0000C15A e_add16i r3, r3, -0x1650 # 0xE9B0 seg001:0000C15E se_mtlr r3 seg001:0000C160 se_blrl
The code calculates an address of a function in r3
and then calls it. IDA helpfully shows the final address in a comment, but we can also use custom offsets to represent them nicely. For the e_add16i
instruction, we can use the LOW16
type, as expected, but in case of e_lis
, the processor-specific type HIGHA16
should be used instead of HIGH16
. This is because the low 16 bits are used here not as-is but as a sign-extened addend, with the high 16 bits of the final address becoming 0 after the addition (0x10000-0x1650=0xE9B0).
After converting both parts, IDA uses special assembler operators to show the final address:
Now we can go to the target and create a function there.
Note: specifically for PowerPC, IDA will automatically convert such sequences to offset expression if the target address exists and has instructions or data. But the manual approach can still be useful for other processors or complex situations (for example, the two instructions are too far apart).