This is a guest entry written by Can Bölük. His views and opinions are his own and not those of Hex-Rays. Any technical or maintenance issues regarding the code herein should be directed to the author.
NtRays: Reversing Windows kernel, simplified
Windows kernel has changed a lot in the past few years, with the addition of Hypervisor enhancements, security mitigations, scheduler hints, and general performance optimizations, it has become much snappier and more secure. However, combined with inlining, this also means that it has become increasingly more complicated to understand with each passing week, seemingly with no end.
NtRays is an open-source IDA plugin using Hex-Ray's powerful microcode hooks to help you read through the inlined boilerplate code with a simplified pseudo-code output reminiscent of the Windows XP era, with a few extra features to help in any kind of kernel mode reverse engineering.
Using NtRays
You can install NtRays through a single drag and drop into the plugins folder, either by using the pre-compiled release from the Github repository or by compiling a dll from source.
Following the installation, the next time you launch IDA Pro, you will have its entry under Edit > Plugins > NtRays
, from which you can simply toggle it on or off.
Features
Scheduler assist & Perf instrumentations
Take the function KeReleaseInterruptSpinLock
as an example, I mean, it does sound like a simple one, doesn't it? Here's a comparison between Windows 7 and the latest Windows 11 kernels.
Now imagine if this was inlined into a much more complex subroutine with multiple calls, you'd basically be combing through the boilerplate to follow the actual logic, which is, unfortunately, what looking at NT kernel feels like these days.
By utilizing Hex-Ray's microcode optimizations, NtRays turns the Windows 11 version into a measly 7 lines.
Memory manager: Dynamic relocations
NT kernel has two very special constants, namely the PTE base and the PFN database. Nowadays, with KASLR (Kernel-Mode Address Space Layout Randomization), they aren't constants at all, but for performance reasons, the MS compiler keeps them as constants (propagating any arithmetic as well) and instead puts the relocation information in the PE header for the bootloader to patch during startup.
Naturally, this becomes a nightmare when you try to understand how the memory manager works, consider MmGetVirtualForPhysical
, for instance, which does nothing more than looking up a field in the PFN database and subtracting it from the PTE base.
This time, NtRays modifies the C-level tree to give you type information as well as clear names for the constants.
KUSER_SHARED_DATA
On the topic of constants, another one is the address of KUSER_SHARED_DATA
, which is a structure with two constant addresses, one for user-mode and one for kernel-mode, holding global system information such as the time.
This is lifted as a global variable with the proper type applied instead of another annoying constant.
Mitigations
Following the Spectre/Meltdown vulnerability, every OS vendor targeting chips with speculative execution features (Hint: all of them) had to implement certain mitigations. Naturally, this ended up in the NT kernel as well.
KPTI: Shadow page tables
To secure kernel-mode memory, NT started keeping two sets of page tables per process, this means every page table access is now followed by a very long branch doing the exact same operation.
A comparison of MiInitializePfn
with and without NtRays demonstrates how distracting it can be.
RSB flushes
Return stack buffer was another concern, so the kernel started inserting RSB flush gadgets to the start of every mode-switch, such as the one below.
This used to prevent decompilation entirely. NtRays instead lifts this sequence into an imaginary intrinsic.
Interrupt Service Routines
Interrupt service routines are the bread and butter of any operating system. However, they are usually written in assembly with an irregular calling convention, and a C decompilation may not represent what's going on very clearly.
To help clear things up, once again, we use the Hex-Rays microcode coupled with the powerful adjusted pointer primitive in the IDA type system.
In this example, you can also see the additional non-standard intrinsics NtRays implements for OS-level code such as the __swapgs
.
Although the IDA inline assembly representation is often adequate, from time to time, it does become problematic due to variables being forced into register-names, instructions like int 2c
not being marked noreturn
, etc.
This functionality is extended to many system instructions, such as:
- __assert_fail: int 2C
- __cpuid: cpuid
- __xgetbv: xgetbv
- __xsetbv: xsetbv
- __clac: clac
- __stac: stac
- __swapgs: swapgs
- __saveprevssp: saveprevssp
- __setssbsy: setssbsy
- __endbr64: endbr64
- __endbr32: endbr32
- __incsspq: incsspq
- __incsspd: incsspd
- __rstorssp: rstorssp
- __wrssd: wrssd
- __wrssq: wrssq
- __wrussd: wrussd
- __wrussq: wrussq
- __clrssbsy: clrssbsy
- _mm_clflushopt: clflushopt
- _mm_clwb: clwb
- __vmclear: vmclear
- __vmlaunch: vmlaunch
- __vmptrld: vmptrld
- __vmptrst: vmptrst
- __vmwrite: vmwrite
- __vmxoff: vmxoff
- __vmxon: vmxon
- _invept: invept
- _invvpid: invvpid
- _invpcid: invpcid
- _invlpga: invlpga
- _xsaves: xsaves
- _xrstors: xrstors
- _mm_prefetcht0: prefetcht0
- _mm_prefetcht1: prefetcht1
- _mm_prefetcht2: prefetcht2
- _mm_prefetchnta: prefetchnta
- _rdrand: rdrand
- _rdseed: rdseed
- __rdsspd: rdsspd
- __rdsspq: rdsspq
- __vmread: vmread
- __iretq: iretq
- __sysretq: sysretq
Closing
NtRays can be very useful for Windows reverse engineering both in kernel-mode and user-mode. However, it's important to note that some optimizations may discard important information depending on the area of focus in your research.
If you find yourself trying to understand how the shadow table implementation works and end up confused due to all the missing code, it might be a good idea to hit the global toggle to see the real implementation!
NtRays source code comes with a very easy-to-use wrapper around the IDA SDK, HexSuite, and an average optimizer for scenarios demonstrated above ends up somewhere between 20 to 30 lines.
If you find yourself dealing with more boilerplate code that could be lifted in a similar fashion, I highly encourage modifying the code base and, if you are willing, sending a pull request, which is always very welcome!