Latest available version: IDA and decompilers v8.4.240320sp1 see all releases
Hex-Rays logo State-of-the-art binary code analysis tools
email icon
Virtual Machine used by Microcode

We can imagine a virtual micro machine that executes microcode. This virtual micro machine has many registers. Each register is 8 bits wide. During translation of processor instructions into microcode, multibyte processor registers are mapped to adjacent microregisters. Processor condition codes are also represented by microregisters. The microregisters are grouped into following groups:

  • 0..7: condition codes
  • 8..n: all processor registers (including fpu registers, if necessary) this range may also include temporary registers used during the initial microcode generation
  • n.. : so called kernel registers; they are used during optimization see is_kreg()

Each micro-instruction (minsn_t) has zero to three operands. Some of the possible operands types are:

  • immediate value
  • register
  • memory reference
  • result of another micro-instruction

The operands (mop_t) are l (left), r (right), d (destination). An example of a microinstruction:

    add r0.4, #8.4, r2.4

which means 'add constant 8 to r0 and place the result into r2'. where

  • the left operand is 'r0', its size is 4 bytes (r0.4)
  • the right operand is a constant '8', its size is 4 bytes (#8.4)
  • the destination operand is 'r2', its size is 4 bytes (r2.4) Note that 'd' is almost always the destination but there are exceptions. See mcode_modifies_d(). For example, stx does not modify 'd'. See the opcode map below for the list of microinstructions and their operands. Most instructions are very simple and do not need detailed explanations. There are no side effects in microinstructions.

Each operand has a size specifier. The following sizes can be used in practically all contexts: 1, 2, 4, 8, 16 bytes. Floating types may have other sizes. Functions may return objects of arbitrary size, as well as operations upon UDT's (user-defined types, i.e. are structs and unions).

Memory is considered to consist of several segments. A memory reference is made using a (selector, offset) pair. A selector is always 2 bytes long. An offset can be 4 or 8 bytes long, depending on the bitness of the target processor. Currently the selectors are not used very much. The decompiler tries to resolve (selector, offset) pairs into direct memory references at each opportunity and then operates on mop_v operands. In other words, while the decompiler can handle segmented memory models, internally it still uses simple linear addresses.

The following memory regions are recognized:

  • GLBLOW global memory: low part, everything below the stack
  • LVARS stack: local variables
  • RETADDR stack: return address
  • SHADOW stack: shadow arguments
  • ARGS stack: regular stack arguments
  • GLBHIGH global memory: high part, everything above the stack Any stack region may be empty. Objects residing in one memory region are considered to be completely distinct from objects in other regions. We allocate the stack frame in some memory region, which is not allocated for any purposes in IDA. This permits us to use linear addresses for all memory references, including the stack frame.

If the operand size is bigger than 1 then the register operand references a block of registers. For example:

    ldc   #1.4, r8.4

loads the constant 1 to registers 8, 9, 10, 11:

     #1  ->  r8
     #0  ->  r9
     #0  ->  r10
     #0  ->  r11

This example uses little-endian byte ordering. Big-endian byte ordering is supported too. Registers are always little- endian, regardless of the memory endianness.

Each instruction has 'next' and 'prev' fields that are used to form a doubly linked list. Such lists are present for each basic block (mblock_t). Basic blocks have other attributes, including:

  • dead_at_start: list of dead locations at the block start
  • maybuse: list of locations the block may use
  • maybdef: list of locations the block may define (or spoil)
  • mustbuse: list of locations the block will certainly use
  • mustbdef: list of locations the block will certainly define
  • dnu: list of locations the block will certainly define but will not use (registers or non-aliasable stkack vars)

These lists are represented by the mlist_t class. It consists of 2 parts:

  • rlist_t: list of microregisters (possibly including virtual stack locations)
  • ivlset_t: list of memory locations represented as intervals we use linear addresses in this list. The mlist_t class is used quite often. For example, to find what an operand can spoil, we build its 'maybe-use' list. Then we can find out if this list is accessed using the is_accessed() or is_accessed_globally() functions.

All basic blocks of the decompiled function constitute an array called mba_t (array of microblocks). This is a huge class that has too many fields to describe here (some of the fields are not visible in the sdk) The most importants ones are:

  • stack frame: frregs, stacksize, etc
  • memory: aliased, restricted, and other ranges
  • type: type of the current function, its arguments (argidx) and local variables (vars)
  • natural: array of pointers to basic blocks. the basic blocks are also accessible as a doubly linked list starting from 'blocks'.
  • bg: control flow graph. the graph gives access to the use-def chains that describe data dependencies between basic blocks

Facilities for debugging decompiler plugins: Many decompiler objects have a member function named dstr(). These functions create a text representation of the object and return a pointer to it. They are very convenient to use in a debugger instead of inspecting class fields manually. The mba_t object does not have the dstr() function because its text representation very long. Instead, we provide the mba_t::dump_mba() and mba_t::dump() functions.

To ensure that your plugin manipulates the microcode in a correct way, please call mba_t::verify() before returning control to the decompiler.