Igor’s tip of the week #04: More selection!

In the previous post we talked about the basic usage of selection in IDA. This week we’ll describe a few more examples of actions affected by selection.

Firmware/raw binary analysis

When disassembling a raw binary, IDA is not always able to detect code fragments and you may have to resort to trial & error for finding the code among the whole loaded range which can be a time-consuming process. In such situation the following simple approach may work for initial reconnaissance:

  1. Go to the start of the database (CtrlPgUp);
  2. Start selection (AltL);
  3. Go to the end (CtrlPgDn). You can also go to a specific point that you think may be the end of code region (e.g. just before a big chunk of zeroes or FF bytes);
  4. Select Edit > Code or press C. You’ll get a dialog asking what specific action to perform:
  5. Click “Force” if you’re certain there are mostly instructions in the selected range, or “Analyze” if there may be data between instructions.
  6.  IDA will go through the selected range and try to convert any undefined bytes to instructions. If there is indeed valid code in the selected area, you might see functions being added to the Functions window (probably including some false positives).

Structure offsets

Another useful application of selection is applying structure offsets to multiple instructions. For example, let’s consider this function from a UEFI module:

.text:0000000000001A64 sub_1A64        proc near               ; CODE XREF: sub_15A4+EB↑p
.text:0000000000001A64                                         ; sub_15A4+10E↑p
.text:0000000000001A64
.text:0000000000001A64 var_28          = qword ptr -28h
.text:0000000000001A64 var_18          = qword ptr -18h
.text:0000000000001A64 arg_20          = qword ptr  28h
.text:0000000000001A64
.text:0000000000001A64                 push    rbx
.text:0000000000001A66                 sub     rsp, 40h
.text:0000000000001A6A                 lea     rax, [rsp+48h+var_18]
.text:0000000000001A6F                 xor     r9d, r9d
.text:0000000000001A72                 mov     rbx, rcx
.text:0000000000001A75                 mov     [rsp+48h+var_28], rax
.text:0000000000001A7A                 mov     rax, cs:gBS
.text:0000000000001A81                 lea     edx, [r9+8]
.text:0000000000001A85                 mov     ecx, 200h
.text:0000000000001A8A                 call    qword ptr [rax+50h]
.text:0000000000001A8D                 mov     rax, cs:gBS
.text:0000000000001A94                 mov     r8, [rsp+48h+arg_20]
.text:0000000000001A99                 mov     rdx, [rsp+48h+var_18]
.text:0000000000001A9E                 mov     rcx, rbx
.text:0000000000001AA1                 call    qword ptr [rax+0A8h]
.text:0000000000001AA7                 mov     rax, cs:gBS
.text:0000000000001AAE                 mov     rcx, [rsp+48h+var_18]
.text:0000000000001AB3                 call    qword ptr [rax+68h]
.text:0000000000001AB6                 mov     rax, [rsp+48h+var_18]
.text:0000000000001ABB                 add     rsp, 40h
.text:0000000000001ABF                 pop     rbx
.text:0000000000001AC0                 retn
.text:0000000000001AC0 sub_1A64        endp

If we know that gBS is a pointer to EFI_BOOT_SERVICES, we can convert accesses to it (in the call instructions) to structure offsets. It can be done for each access manually but is tedious. In such situation the selection can be helpful. If we select the instructions accessing the structure and press T (structure offset), a new dialog pops up:

You can select which register is used as the base, which structure to apply and even select which specific instructions you want to convert.

After selecting rax and EFI_BOOT_SERVICES, we get a nice-looking listing:

.text:0000000000001A64 sub_1A64        proc near               ; CODE XREF: sub_15A4+EB↑p
.text:0000000000001A64                                         ; sub_15A4+10E↑p
.text:0000000000001A64
.text:0000000000001A64 Event           = qword ptr -28h
.text:0000000000001A64 var_18          = qword ptr -18h
.text:0000000000001A64 Registration    = qword ptr  28h
.text:0000000000001A64
.text:0000000000001A64                 push    rbx
.text:0000000000001A66                 sub     rsp, 40h
.text:0000000000001A6A                 lea     rax, [rsp+48h+var_18]
.text:0000000000001A6F                 xor     r9d, r9d        ; NotifyContext
.text:0000000000001A72                 mov     rbx, rcx
.text:0000000000001A75                 mov     [rsp+48h+Event], rax ; Event
.text:0000000000001A7A                 mov     rax, cs:gBS
.text:0000000000001A81                 lea     edx, [r9+8]     ; NotifyTpl
.text:0000000000001A85                 mov     ecx, 200h       ; Type
.text:0000000000001A8A                 call    [rax+EFI_BOOT_SERVICES.CreateEvent]
.text:0000000000001A8D                 mov     rax, cs:gBS
.text:0000000000001A94                 mov     r8, [rsp+48h+Registration] ; Registration
.text:0000000000001A99                 mov     rdx, [rsp+48h+var_18] ; Event
.text:0000000000001A9E                 mov     rcx, rbx        ; Protocol
.text:0000000000001AA1                 call    [rax+EFI_BOOT_SERVICES.RegisterProtocolNotify]
.text:0000000000001AA7                 mov     rax, cs:gBS
.text:0000000000001AAE                 mov     rcx, [rsp+48h+var_18] ; Event
.text:0000000000001AB3                 call    [rax+EFI_BOOT_SERVICES.SignalEvent]
.text:0000000000001AB6                 mov     rax, [rsp+48h+var_18]
.text:0000000000001ABB                 add     rsp, 40h
.text:0000000000001ABF                 pop     rbx
.text:0000000000001AC0                 retn
.text:0000000000001AC0 sub_1A64        endp

Forced string literals

When some code is referencing a string, IDA is usually smart enough to detect it and convert referenced bytes to a literal item. However, in some cases the automatic conversion does not work, for example:

  • string contains non-ASCII characters
  • string is not null-terminated

A common example of the former is Linux kernel which uses a special byte sequence to mark different categories of kernel messages. For example, consider this function from the joydev.ko module:

IDA did not automatically create a string at 1BC8 because it starts with a non-ASCII character. However, if we select the string’s bytes and press A (Convert to string), a string is created anyway:

Creating structures from data

This action is useful when dealing with structured data in binaries. Let’s consider a table with approximately this layout of entries:

struct copyentry {
 void *source;
 void *dest;
 int size;
 void* copyfunc;
};

While such a structure can always be created manually in the Structures window, often it’s easier to format the data first then create a structure which describes it. After creating the four data items, select them and from the context menu, choose “Create struct from selection”:

IDA will create a structure representing the selected data items which can then be used to format other entries in the program or in disassembly to better understand the code working with this data.