Hex-Rays' blog

Igor’s tip of the week #87: Function chunks and the decompiler – Hex Rays

Written by Igor Skochinsky | Apr 28, 2022

We’ve covered function chunks last week and today we’ll show an example of how to use them in practice to handle a common compiler optimization.

 

Shared function tail optimization

When working with some ARM firmware, you may sometimes run into the following situation:

We have decompilation of sub_8098C which ends with a strange JUMPOUT statement and if we look at the disassembly, we can see that it corresponds to a branch to a POP.W instruction in another function (sub_8092C). What happened here?

This is an example of a code size optimization. The POP.W instruction is 4 bytes long, while the B branch is only two, so by reusing it the compiler saves two bytes. It may not sound like much, but such savings can accumulate to something substantial over all functions of the binary. Also, sometimes longer sequences of several instructions may be reused, leading to bigger savings.

Can we fix the database to get clean decompilation and get rid of JUMPOUT? Of course, the answer is yes, but the specific steps may be not too obvious, so let’s describe some approaches.

Creating a chunk for the shared tail instructions

First we need to create a chunk for the shared instructions (in our example, the POP.W instruction). A chunk can be created only from instructions which do not yet belong to any function, thus the easiest way is to delete the function so that instructions become “free”. This can be done either from the Functions window, via Edit > Functions > Delete function menu entry, or from the modal “jump to function” list (Ctrl–P, Del).

Once deleted, the shared tail instructions can be added as a chunk to the other function. This can be done manually:

  1. select the instruction(s),
  2. invoke Edit > Functions > Append function tail…
  3. pick the referencing function (in our case, sub_8098C). Normally IDA should suggest it automatically.

Or (semi)automatically:

  1. jump to the referencing branch (e.g. by double-clicking the CODE XREF: sub_8098C+3E↓j comment)
  2. reanalyze the branch (press C). IDA will detect that execution continues outside the current function bounds and automatically create and add the chunk for the shared tail instructions.

Either solution will create the chunk and mark it as belonging to the referencing function.

We can check that it is contained in the function graph:

And the pseudocode no longer has a JUMPOUT:

Attaching the chunk to the original function

We “solved” the problem for one function, but in the process we’ve destroyed the function which contained the shared tail. If we need to decompile it too, we can try to recreate it:

However, IDA ends it before the chunk, because it’s now a part of another function:

And if we decompile it, we get the same JUMPOUT issue:

The solution is simple: as mentioned in the previous post, a chunk may belong to multiple functions, so we just need to attach the chunk to this function too:

  1. Select the instructions of the tail;
  2. invoke Edit > Functions > Append function tail…
  3. select the recreated function (in our example, sub_8092C).

The chunk gains one more owner, appears in the function graph, and the decompilation is fixed:

 

Complex situations

The above example had a tail shared by two functions, but of course this is not the limit. Consider this example:

Here, the POP.W instruction is shared by seven functions, and two of them also reuse the ADD SP, SP, #0x10 instruction preceding it. There is also a chunk which belongs only to one function but it had to be separated because the function was no longer contiguous. Still, IDA’s approach to fragmented functions was flexible enough to handle it with some manual help and all involved functions have proper control flow graphs and nice decompilation.

To summarize, the suggested algorithm of handling shared tail optimization is as follows:

  1. Delete the function containing the shared tail instructions. 
  2. Attach the shared tail instructions to the other function(s) (manually or by reanalyzing the branches to the tail)
  3. Recreate the deleted function and attach the shared tail(s) to it too.