Walking the stack and trying to reconstruct the call stack is a challenge (especially if no or little symbolic information is present) and there are many questions to be answered in order to have a correct call stack:
In this post, we are going to implement the method entitled “Manually Walking a Stack” described in the MSDN.
While this approach does not always give accurate results, it is still possible to get a fairly correct call stack.
In short, this is how manual stack walking works:
First we need to retrieve the value of the ESP register:
esp = cpu.Esp
Now we dereference the stack pointer, fetch the associated segment and check the segment protection attributes:
ptr = idc.Dword(sp) seg = idaapi.getseg(ptr) # only accept executable segments if (not seg) or ((seg.perm & idaapi.SEGPERM_EXEC) == 0): SKIP !
From the previous step we managed to filter out any pointer that does not belong to an executable segment, but that’s not enough: we need to determine whether it is a return address or not.
In compiler generated code scenarios most calls are carried out with a CALL instruction (be it direct or indirect call), and for that reason we will not take into consideration any other code pattern that could act like a CALL (for instance the push/ret sequence).
To get the address of the previous instruction:
prev_ea = idc.PrevHead(current_ea, idc.MinEA())
This works only if IDA already analyzed the area in question and items were already defined there. We could analyze (AnalyzeArea()) the area surrounding the pointer we retrieved from the stack, but that would be an overkill.
Since we are looking for the previous instruction and specifically a CALL instruction, we shall use a pattern table:
CallPattern = \ [ [-2, [0xFF] ], [-3, [0xFF] ], [-5, [0xE8] ], [-6, [0xFF] ] ]
Each item in this table is defined as a list where the first element is the distance from the return address to the beginning of the CALL instruction and the second element is a list of values denoting the CALL opcode(s).
Matching the pattern alone is also not enough since other instructions can contain 0xFF or 0xE8, so we will ask the processor module to decode what we think is a CALL instruction:
cmd = idautils.DecodeInstruction(some_address_ea) if (cmd.itype == idaapi.NN_call): print "found a call"
After the instruction is decoded, we can inspect its opcode number.
In case you did not know, a list of opcodes for various processors is available in the SDK (check the allins.hpp file), similarly these opcode constants are defined in the idaapi python module.
(...from allins.hpp...) NN_call, // Call Procedure NN_callfi, // Indirect Call Far Procedure NN_callni, // Indirect Call Near Procedure (...)
We notice that the pc processor module can report three different opcode numbers for a CALL instruction, so our previous code snippet is not quite correct because we did not check for NN_callfi and NN_callni as well. For this reason, using is_call_insn() function is more correct:
def IsPrevInsnCall(ea): global CallPattern for p in CallPattern: # assume caller's ea caller = ea + p[0] # get the bytes bytes = [x for x in GetDataList(caller, len(p[1]), 1)] # do we have a match? is it a call instruction? if bytes == p[1] and idaapi.is_call_insn(caller): return caller return False
We wrote a small python script to implement this logic and we tested it by attaching to a running notepad with WinDbg debugger module (symbols configured):
As you noticed, the call stack boils down to RtlUserThreadStart(). One can use this call stack information to try to locate the original entry point of packed executables!
Download the script from here. Please note that the script will use debug names only if IdaPython r232 and above is detected.