An attempt to reconstruct the call stack – Hex Rays

Written by Elias Bachaalany | Jul 16, 2008

Walking the stack and trying to reconstruct the call stack is a challenge (especially if no or little symbolic information is present) and there are many questions to be answered in order to have a correct call stack:

Determining return address
Determining the boundary of the caller function
Distinguishing between pointers to callbacks and return addresses
Determining stack frames
…

In this post, we are going to implement the method entitled “Manually Walking a Stack” described in the MSDN.
While this approach does not always give accurate results, it is still possible to get a fairly correct call stack.

In short, this is how manual stack walking works:

Start by retrieving the stack pointer register value (for the current thread) and its associated segment
From the stack pointer to the upper limit of the stack segment:
1. Take a Dword
2. Check if it belongs to an executable segment, if so then it is probably a code pointer (exception handler, callback pointer, or return address)
3. Try to determine if the value at the stack pointer is a return address (we try to find the beginning of the previous instruction and we decode it to see if it is a CALL instruction)
4. Once we have a CALL instruction we will try to build a nice expression to represent the call stack:
  - If it belongs to a function then use the following name: function name+offset
  - Otherwise try to check nearest debug name (exported names) and use the following name: nearest_debug_name+offset
5. Save the address (for later use)
Finally render the results (in a chooser, message window, etc…)

Retrieving pointers from the stack

First we need to retrieve the value of the ESP register:

esp = cpu.Esp

Now we dereference the stack pointer, fetch the associated segment and check the segment protection attributes:

    ptr = idc.Dword(sp)
    seg = idaapi.getseg(ptr)
    # only accept executable segments
    if (not seg) or ((seg.perm & idaapi.SEGPERM_EXEC) == 0):
        SKIP !

Determining the return address

From the previous step we managed to filter out any pointer that does not belong to an executable segment, but that’s not enough: we need to determine whether it is a return address or not.
In compiler generated code scenarios most calls are carried out with a CALL instruction (be it direct or indirect call), and for that reason we will not take into consideration any other code pattern that could act like a CALL (for instance the push/ret sequence).
To get the address of the previous instruction:

prev_ea = idc.PrevHead(current_ea, idc.MinEA())

This works only if IDA already analyzed the area in question and items were already defined there. We could analyze (AnalyzeArea()) the area surrounding the pointer we retrieved from the stack, but that would be an overkill.
Since we are looking for the previous instruction and specifically a CALL instruction, we shall use a pattern table:

CallPattern = \
[
    [-2, [0xFF] ],
    [-3, [0xFF] ],
    [-5, [0xE8] ],
    [-6, [0xFF] ]
]

Each item in this table is defined as a list where the first element is the distance from the return address to the beginning of the CALL instruction and the second element is a list of values denoting the CALL opcode(s).
Matching the pattern alone is also not enough since other instructions can contain 0xFF or 0xE8, so we will ask the processor module to decode what we think is a CALL instruction:

    cmd = idautils.DecodeInstruction(some_address_ea)
    if (cmd.itype == idaapi.NN_call):
        print "found a call"

After the instruction is decoded, we can inspect its opcode number.
In case you did not know, a list of opcodes for various processors is available in the SDK (check the allins.hpp file), similarly these opcode constants are defined in the idaapi python module.

    (...from allins.hpp...)
    NN_call,                // Call Procedure
    NN_callfi,              // Indirect Call Far Procedure
    NN_callni,              // Indirect Call Near Procedure
    (...)

We notice that the pc processor module can report three different opcode numbers for a CALL instruction, so our previous code snippet is not quite correct because we did not check for NN_callfi and NN_callni as well. For this reason, using is_call_insn() function is more correct:

def IsPrevInsnCall(ea):
global CallPattern
for p in CallPattern:
    # assume caller's ea
    caller = ea + p[0]
    # get the bytes
    bytes = [x for x in GetDataList(caller, len(p[1]), 1)]
    # do we have a match? is it a call instruction?
    if bytes == p[1] and idaapi.is_call_insn(caller):
        return caller
return False

Putting it all together

We wrote a small python script to implement this logic and we tested it by attaching to a running notepad with WinDbg debugger module (symbols configured):

As you noticed, the call stack boils down to RtlUserThreadStart(). One can use this call stack information to try to locate the original entry point of packed executables!
Download the script from here. Please note that the script will use debug names only if IdaPython r232 and above is detected.

View full post