Igor’s tip of the week #95: Offsets

As we’ve mentioned before, the same numerical value can be used represented in different ways even if it’s the same bit pattern on the binary level. One of the representations used in IDA is offset.

Offsets

In IDA, an offset is a numerical value which is used as an address (either directly or as part of an expression) to refer to another location in the program.

The term comes from the keyword used in MASM (Microsoft Assembler) to distinguish an address expression from a variable.

For example:

mov eax, g_var1

Loads  the value from the  location g_var1 into register eax. In C, this would be equivalent to using the variable’s value.

While 

mov eax, offset g_var1

Loads the address of the location g_var1 into eax. In C, this would be equivalent to taking the variable’s address.

On the binary level, the second instruction is equivalent to moving of a simple integer, e.g.: 

mov eax, 0x40002000

However, during analysis the offset form is obviously preferred, both for readability and because it allows you to see cross-references to variables and be able to quickly identify other places where the variable is used.

In general, distinguishing integer values used in instructions from addresses is impossible without whole program analysis or runtime tracing, but the majority of cases can be handled by relatively simple heuristics so usually IDA is able to recover offset expressions and add cross-references. However, in some cases they may fail or produce false positives so you may need to do it manually.

Converting values to offsets

All options for converting to offsets are available under Edit > Operand type > Offset:

In most modern, flat-memory model binaries such as ELF, PE, Mach-O, the first two commands are equivalent, so you can usually use shortcut O or CtrlO.

The most common/applicable options are also shown in the context (right-click) menu:

 

Fixing false positives

There may be cases when IDA’s heuristics convert a value to an offset when it’s not actually being used as one. One common example is bitwise operations done with values which happen to be in the range of the program’s address space, but it can also happen for data values or simple data movement, like on the below screenshot.

In this example, IDA has converted the second operand of the mov instruction to an offset because it turned out to match a program address. However, we can see that it is being moved into a location returned by the call to __errno function. This is a common way compilers implement setting of the errno pseudo-variable (which can be thread-specific instead of a global), so obviously that operand should be a number and not an offset. Besides being a wrong representation, this also lead to bogus cross-references:

You have the following options to fix the false positive:

  1. Press O or CtrlO to reset the “offset” attribute of the operand and let IDA show the default representation (hex). Note that the number will be printed in orange to hint that its value falls into the address space of the program, i.e. it is suspicious;
  2. Use Q/# (for hex), H (for decimal), or select the corresponding option from the context menu to explicitly mark the operand as a number and also avoid flagging it as suspicious;
  3. If you have created an enumeration to represent such numbers as symbolic constants, you can use the M shortcut or the context menu to convert it to a symbolic constant.

See also:

IDA Help: Edit|Operand types|Offset submenu

IDA Help: Edit|Operand types|Number submenu