Memory corruption exploits have historically been one of the strongest accessories in a good red teamer’s toolkit. They present an easy win for offensive security engineers, as well as adversaries, by allowing the attacker to execute payloads without relying on any user interaction.
Fortunately for defenders, but unfortunately for researchers and adversaries, these types of exploits have become increasingly more difficult to execute, thanks largely to a wide array of operating system mitigations that have been implemented directly within the systems we use every day. This vast apparatus of mitigations makes formerly trivial exploitation expensive and arduous on more modern hardware and software.
This two-part blog series walks through the evolution of exploit development and vulnerability research on Windows systems. It addresses questions such as “How does this affect the landscape of future breaches?” and “Is the price for developing a reliable, portable and effective binary exploit still worth it?”
How Did We Get Here?
From its inception, computing garnered curiosity, which eventually led to the discovery of the “computer bug,” or unintended behavior from systems as a result of user interaction. This, in turn, led to the use of these bugs by bad actors with malign intent and launched the era of binary exploitation. Since then, security researchers, red teamers and adversaries alike have never looked back.
The onset of binary exploitation has led vendors, most notably Microsoft and Apple (with a special mention to grsecurity on Linux who led the charge over two decades ago), to thwart these exploits with various mitigations. These exploitation mitigations, many of which are enabled by default, have reduced the impact of modern exploitation.
Akin to the massive use of Active Directory in enterprise environments, which has forced red team research to place heavy focus on Microsoft products, adversaries and researchers have made Windows a focal point, due to its widespread use in both corporate and non-corporate environments. As a result, this blog will be Windows-centric focusing on both user mode and kernel mode mitigations.
Vulnerability Classes: Then and Now
Researchers and adversaries have always had to answer an age-old question when it comes to binary exploitation: “How can code be executed on a target without any user interaction?” The answer came in the form of various vulnerability classes. Although not an exhaustive list, some common vulnerabilities include:
- Classic stack overflow (yes, even in 2020): This is the ability to overwrite existing content on the stack and use a controlled write to locate and corrupt the return address of a function to jump to an arbitrary location.
- Use-after-free: An object is allocated in memory on the heap in user mode (or in kernel-mode pool memory). A premature “free” of this object occurs, though a reference/handle to this freed object remains. Using another primitive, a new object is created in the freed object’s place and the reference to the old object is utilized to execute or otherwise modify the new object, which is acting in place of the old object. The expectation is that these unexpected changes to the new object somehow result in privilege escalation or other malicious capability.
- Arbitrary writes: This is the ability to arbitrarily write data, such as one or more pointers, to an arbitrary location. This also could be the result of another vulnerability class. Arbitrary write primitives can also be leveraged as arbitrary read primitives depending on the precision one has over the write primitive.
- Type-confusion: An object is of one type, but later on that type is referenced as another type. Due to the layout in memory of various data types, this can lead to unexpected behavior.
Matt Miller of Microsoft gave a talk at BlueHat IL in 2019 outlining the top vulnerability classes since 2016: out-of-bounds read, use-after-free, type confusion and uninitialized use. These bug classes have been and are still leveraged by adversaries and security researchers.
It is also worth noting that each of these vulnerability classes can be located in user mode, kernel mode, and these days — the hypervisor. User-mode vulnerabilities historically have been exploited remotely or through common desktop applications like browsers, office productivity suites, and PDF readers. However, kernel-mode vulnerabilities are primarily exploited locally — once access has been gained to a system in order to elevate privileges. Often, such vulnerabilities are combined with a user-mode vulnerability, achieving what is often called a local remote. Additionally, there are cases such as MS17-010 (commonly referred to as EternalBlue), CVE-2019-0708 (commonly referred to as BlueKeep), and CVE-2020-0796 (commonly referred to as SMBGhost) where kernel remote code execution is possible.
Due to the rise in exploitation, vendors had to provide some way of preventing these exploits from executing. Hence, the exploit mitigation was born.
Exploit Mitigations: Then and Now
While security researchers and adversaries historically have had the upper hand with various methods of delivering payloads through vulnerabilities, vendors slowly started to level the playing field by implementing various mitigations, in hopes of eliminating bug classes completely or breaking common exploitation methods. At the very least, the hope is that a mitigation will make the technique too expensive, or unreliable, for mass-market use, such as in a drive-by exploit kit.
Exploit mitigations, as defined here, have come a long way since the early days of Windows. Legacy mitigations — the initial mitigations released on Microsoft operating systems — will be addressed first. Contemporary mitigations, which comprise more prevalent and documented instruments of exploit thwarting, will be the second pillar outlined in this series. Lastly, mitigations that are less documented and not as widely adopted — referred to here as “Modern,” or cutting-edge mitigations — will wrap up the series.
Legacy Mitigation #1: DEP a.k.a No-eXecute (NX)
Data Execution Prevention (DEP), referred to as No-eXecute (NX), was one of the first mitigations that forced researchers and adversaries to adopt additional methods of exploitation. DEP prevents arbitrary code from being executed in non-executable portions of memory. It was introduced both in user mode and kernel mode with Windows XP SP2, although only for the user-mode heap and stack, and the kernel-mode stack plus pageable kernel memory (paged pool). It took many more releases, up to and including Windows 8, for most kernel-mode heap memory, including resident memory (nonpaged pool) to become non-executable. Although considered to be an “older” mitigation, it remains one that all vulnerability researchers and adversaries have to take into consideration.
DEP’s implementation in kernel mode and in user mode is very similar in that DEP is enforced on a per-page memory basis via a page table entry. A page table entry, or PTE, refers to the lowest-level entry in the paging structures used for virtual memory translation. PTEs, at a very high level, contain bits that are responsible for enforcing various permissions and properties for a given range of virtual addresses. Each chunk of virtual memory, referred to as a page (and typically 4KB), is marked as either executable or writable — but not both at the same time — via its page table entry in the kernel.
!pte commands in WinDbg can provide greater insight into DEP’s implementation.
Figure 1: A user mode stack only has read/write permissions with DEP enabled
Figure 2: With DEP enabled, the user-mode address 00007ffe`54a40000 doesn’t have the executable PTE control bit set
Before kernel-mode DEP was extended to cover the resident kernel heap on Windows operating systems, the PTEs for such allocations were marked as
RWX — which refers to the
NonPagedPool — meaning that this type of kernel-mode memory was executable and writable. Resident memory refers to the fact that memory owned by this allocation type will never be “paged-out” of memory, meaning this type of virtual memory will always be mapped to a valid physical address.
With the release of Windows 8, the
NonPagedPoolNx pool became the default kernel-mode heap for resident memory allocations. This captures all of the properties of
NonPagedPool but makes it non-executable. Just like for user-mode addresses, the executable bit is enforced by the page table entry of a kernel-mode virtual address.
Figure 3: With kernel-mode DEP enabled, the static kernel-mode structure
KUSER_SHARED_DATA doesn’t have the executable PTE control bit set
Usermode DEP can be bypassed with common exploitation techniques such as return-oriented programming, call-oriented programming and jump-oriented programming. These “code-reuse” techniques are used to dynamically call Windows API functions such as
WriteProcessMemory() to either change permissions of memory pages to
RWX or write shellcode to an already existing executable memory region using pointers from different modules loaded at runtime. In addition to altering permissions of memory, it is also possible to utilize
VirtualAlloc() or similar routines to allocate executable memory.
Kernel-mode DEP can be bypassed using an arbitrary read/write primitive to extract the page table entry control bits for a particular page in memory and modifying them to allow both write and execute access. It is also bypassable by redirecting execution flow into user-mode memory that has already been marked as
RWX, since by default, kernel-mode code can call into user-mode code at will.
Legacy Mitigation #2: ASLR/kASLR
With the addition of DEP, vulnerability researchers and adversaries quickly adopted code reuse techniques. The implementation of Address Space Layout Randomization (ASLR) and Kernel Address Space Layout Randomization (KASLR) caused exploitation to be less straightforward.
ASLR and its kernel-mode implementation KASLR randomize the base addresses of various DLLs, modules, and structures. For instance, this particular version of Windows 10 loads the kernel, before reboot, at the virtual memory address
Figure 4: Base address of
ntoskrnl.exe before reboot
Upon reboot, the kernel is loaded at a different virtual address,
Figure 5: Base address of
ntoskrnl.exe after reboot
Historically, before the implementation of ASLR, defeating DEP was as trivial as disassembling an application or DLL into its raw assembly instructions and utilizing pointers to these instructions, which were static before ASLR, to bypass DEP. However, with the implementation of ASLR, one of three actions is generally required:
- Utilize DLLs and applications that are not compiled with ASLR
- Utilize an out-of-bounds vulnerability or some other type of information/memory leak
- Brute force the address space (not feasible on 64-bit systems)
In today’s modern exploitation environment, information leak vulnerabilities are the standard for bypassing ASLR. Depending on varying circumstances, an information leak can generally be classified as another zero-day vulnerability, in addition to the memory corruption primitive. This means modern exploits can require two zero-days.
Because Windows only performs ASLR on a per-boot basis, all processes share the same address space layout once the system has started. Therefore, ASLR is not effective against a local attacker that already has achieved code execution. Similarly, because the kernel provides introspection APIs to non-privileged users, which provide kernel memory addresses, KASLR isn’t effective against this class of attack either. For this reason, ASLR and KASLR on Windows are only effective mitigations against remote exploitation vectors.
With the rise of the local remote, however, it was recognized that KASLR was ineffective against remote attackers that first achieved a user RCE because, as mentioned, certain Windows API functions, such as
NtQuerySystemInformation(), can be leveraged to enumerate the base addresses of all loaded kernel modules.
Since a local remote attacker would first begin with a user-mode RCE targeting a browser, etc. Microsoft began heavily enforcing that such applications run in sandboxed environments and introduced Mandatory Integrity Control (MIC), and later, AppContainer, as a way to lower the privileges of these applications, through, among other things, running them with a low integrity level. Then, in Windows 8.1, it blocked access to such introspective API functions to medium integrity level processes and above.
Therefore, a low integrity level process, such as a browser sandbox, will require an information leak vulnerability to circumvent KASLR.
Figure 6: Call to
EnumDeviceDrivers() is blocked from low integrity
Several other primitives for leaking the base address of the kernel have been mitigated throughout various builds of Windows 10. Notably, the Hardware Abstraction Layer (HAL) heap, which contains multiple pointers to the kernel, was also located at a fixed location. This was because the HAL heap is needed very early in the boot process, even before the actual Windows memory manager has initialized. At the time, the best solution was to reserve memory for the HAL heap at a perfectly fixed location. This was mitigated with the Windows 10 Creators Update (RS2) build.
Even though ASLR is almost as old as DEP and they are both among some of the first mitigations implemented, they must be taken into consideration during modern exploitation.
Contemporary Mitigation #1: CFG/kCFG
Control Flow Guard (CFG), and its implementation in the kernel known as kCFG, is Microsoft’s version of Control Flow Integrity (CFI). CFG works by performing checks on indirect function calls made inside of modules and applications compiled with CFG. Additionally, the Windows kernel has been compiled with kCFG starting with the Windows 10 1703 (RS2) release. Note, however, that in order for kCFG to be enabled, VBS (Virtualization Based Security) needs to be enabled. VBS will be discussed in more detail in part 2 of this blog series.
With a nod to efficiency for users, indirect calls that are protected by CFG are validated using a bitmap, with a set of bits indicating if a target is “valid” or if the target is “invalid.” A target is considered “valid” if it represents the starting location of a function within a module loaded in the process. This means that the bitmap represents the entire process address space. Each module that is compiled with CFG has its own set of bits in the bitmap, based on where it was loaded in memory. As described in the ASLR section, Windows only randomizes the address space per-boot, so this bitmap is typically mostly shared among all processes, which saves significant amounts of memory.
Generally, at a very high level, indirect user-mode function calls are passed to a
guard_check_icall function (or
guard_dispatch_icall in other cases). This function then dereferences the function
_guard_check_icall_fptr and performs a jump to the pointer, which is a pointer to a function
LdrpValidateUserCallTarget in other cases).
Figure 7: Implementation of user-mode CFG
A series of bitwise operations and assembly functions are performed, which results in checking the bitmap to determine if the function within the indirect function call is a valid function within the bitmap. An invalid function will result in a process termination.
kCFG has a very similar implementation in that indirect function calls are checked by kCFG. Most notably, this “breaks” the
[nt!HalDispatchTable+0x8] primitive that adversaries and researchers have used to execute code in context of the kernel by invoking
nt!KeQueryIntervalProfile, which performs an indirect function call to
[nt!HalDispatchTable+0x8] on 64-bit systems.
[nt!HalDispatchTable+0x8] is now guarded by kCFG via the
nt!KeQueryIntervalProfile indirect call
kCFG uses a slightly different modification of CFG, in that the bitmap is stored inside the variable
nt!_guard_dispatch_icall starts the routine to validate a target and no other function calls are needed.
Figure 9: Implementation of kCFG
CFG mitigates the fact that at some point during the exploit development lifecycle, a function pointer may need to be overwritten to point to a different function pointer which can be beneficial to the adversary (such as VirtualProtect).
CFG is a forward edge CFI mitigation. This means that it does not take
ret instructions into account, which is a backward edge case. Since CFG doesn’t check return addresses, CFG could be bypassed by utilizing an information leak, which may allow an action such as parsing the Thread Environment Block (TEB) to leak the stack. Utilizing this knowledge, it may be possible to overwrite the return address of a function on the stack with malign intent.
CFG has been found, over time, to have a few shortcomings. For example, note that modules make use of an Import Address Table (IAT) for imports, such as Windows API functions. These IAT tables are essentially virtual addresses within a specific module that point to Windows API functions.
Figure 10: IAT of
The IAT is read-only by default and generally cannot be modified. Microsoft has deemed these functions as “safe” due to their read-only state, meaning CFG/kCFG does not protect these functions. If an adversary could modify, or add a malicious entry to the IAT, it would be possible to call a user-defined pointer.
Additionally, adversaries could leverage additional OS functions for code execution. By design, CFG/kCFG only validates if a function begins at the location indicated by the bitmap — not that a function is what it claims to be. If an adversary or researcher could locate additional functions marked as valid in the CFG/kCFG bitmap, it may be possible to overwrite a function’s pointer with another function’s pointer to “proxy” code execution. This could lead to, for example, a type-confusion attack, where a different, unexpected, function is now running with the parameters/objects of the original expected function.
As mentioned earlier, kCFG is only enabled when VBS is enabled. One interesting characteristic of kCFG is that even when VBS is not enabled, kCFG dispatch functions and routines are still present and function calls are still passed through them. With or without VBS enabled, kCFG performs a bitwise check on the “upper” bits of a virtual address to determine if an address is sign-extended (also known as a kernel-mode address). If a user-mode address is detected, regardless of HVCI being enabled, kCFG will cause a bug check of
KERNEL_SECURITY_CHECK_FAILURE. This is one mitigation against kernel-mode code being coerced into calling user-mode code, which we saw was a potential technique to bypass DEP. In the next section, we’ll talk about Supervisor Mode Execution Prevention (SMEP), which is a modern mitigation against this attack as well.
It is also worth noting that the kCFG bitmap is protected by HVCI, or Hypervisor-Protected Code Integrity. HVCI will be referenced in the second part of this blog series.
Contemporary Mitigation #2: SMEP
Supervisor Mode Execution Prevention is a hardware-based CPU mitigation that was implemented specifically against kernel exploits.
NonPagedPoolNx was introduced, researchers and adversaries could no longer write shellcode directly to kernel mode and execute it. This led to the idea that a Windows API function like
VirtualAlloc() could be used to allocate shellcode in user mode, and then pass the returned pointer to the shellcode back to kernel mode. The kernel would then execute the user mode code “in context” of the kernel, meaning the shellcode would run with full kernel privileges.
SMEP works to mitigate this attack by disallowing execution of user-mode code from the kernel. More specifically, x86-based CPUs have an internal state known as the Code Privilege Level (CPL). These CPUs have four different CPLs known as rings. Windows only utilizes two of these rings: ring 3, relating to anything residing in user mode and ring 0, which relates to anything residing in kernel mode. SMEP disallows code belonging to CPL 3 to be executed in the context of CPL 0.
SMEP is enabled via the 20th bit of the CR4 control register. A control register is a register used to change or enable certain features of the CPU such as the implementation of virtual memory through paging, etc. Although SMEP is enabled via the 20th bit of the CR4 register, it is then enforced through the PTE of a memory address. SMEP enforcement works by checking the User vs. Supervisor (
U/S) bit of a PTE for any paging structure. If the bit is set to
U(ser), the page is treated as a user-mode page. If the bit is cleared, meaning the bit is represented as
S(upervisor), the page is treated as a supervisor (kernel-mode) page.
As Alex Ionescu explained at Infiltrate 2015, if only one of the paging entries are set to “S” — SMEP won’t cause a crash. This realization is important, as SMEP can be bypassed by “tricking” the CPU into executing shellcode from user mode through an arbitrary-write.
First, locate the PTE for an allocation in user mode, and then clear the
U/S bit to cause it to be set to S. When this occurs, the CPU will treat this user-mode address as a kernel-mode page — allowing execution to occur.
Figure 11: Leveraging an arbitrary write vulnerability to clear the
U/S bit, resulting in a user-mode page becoming a kernel-mode page
An older technique for bypassing SMEP is to disable it systemwide by leveraging ROP. An adversary could leverage a ROP gadget in kernel mode by finding one that allows overriding the value of the CR4 register to one with the 20th bit cleared, which enables SMEP, back into the CR4 register.
The downside to this method is that you must use kernel-mode ROP gadgets to keep execution in the kernel, in order to adhere to SMEP’s rules. Additionally, as with all code reuse attacks, the offset between gadgets may change between versions of Windows, and control of the stack is a must for ROP to work. A protection called HyperGuard, which is beyond the scope of this blog, also protects against CR4 modification on modern systems.
Figure 12: Disabling SMEP completely system wide via CR4 manipulation
kd> bp nt!part_2
In this blog, legacy mitigations were revisited along with contemporary mitigations such as CFG and SMEP that look to challenge vulnerability researchers and raise the bar and quality of exploits. These topics set the stage for more modern mitigations, such as ACG, XFG, CET, and VBS, which add complexity — increasing the impact of exploitation and challenging readers to become more inquisitive about the return on investment of modern exploit development.
In Part 1 of this two-part blog series, we addressed binary exploitation on Windows systems, including some legacy and contemporary mitigations that exploit writers and adversaries must deal with in today’s cyber landscape. In Part 2, we will walk through more of the many mitigations Microsoft has put in place.
Modern Mitigation #1: Page Table Randomization
As explained in Part 1, page table entries (or PTEs) are very important when it comes to modern-day exploitation. You may recall that PTEs are responsible for enforcing various permissions and properties of memory. Historically, calculating the PTE for a virtual address was trivial, as the base of the PTEs were static for quite some time. The process for obtaining the PTE for a virtual address is:
- Convert the virtual address into a Virtual Page Number (VPN), by dividing by the size of a page (usually 4KB)
- Multiple the VPN by the size of a PTE (8 bytes on 64-bit systems)
- Add the base of the PTEs to the result of the previous operation
In programming terminology, this essentially equates to an array reference by index, such as
On previous versions of Windows, the base of the PTEs were located at the static virtual address
fffff680`00000000. However, after Windows 10 1607 (RS1), the base of the PTEs were randomized — meaning this process is now not so trivial.
One of the ways to bring back the “trivial” method of calculating the PTE for a given virtual address is to derandomize the base of the PTEs. The Windows API exposes a function called
nt!MiGetPteAddress, which has been used in previous exploitation research by Morten Schenk in his BlackHat talk in 2017.
This function performs the exact same routine described above to access the PTE of a virtual address. However, it dynamically fills the base of the PTEs at an offset of 0x13 inside the function.
Figure 1: Page table de-randomization via
Utilizing an arbitrary read primitive, it is possible to extract the base of the page table entries utilizing this technique. With the base of the PTEs in hand, the aforementioned trivial calculation primitive remains valid.
Note that Windows 10 1607 (RS1) not only randomized the PTE base address, but the base address of 14 other regions of kernel memory as well. While the PTE base was the most significant change, these other randomizations also helped curb certain kinds of kernel exploits, which are outside the scope of this post.
Modern Mitigation #2: ACG
Arbitrary Code Guard (ACG), which was introduced in Windows 10, is an optional memory corruption mitigation meant to stop arbitrary code execution. Although ACG was designed with Microsoft Edge in mind, it can be applied to most processes.
ROP, a well-documented technique to bypass DEP, is most commonly used to return into a Windows API function, such as
VirtualProtect(). Utilizing this function and user-supplied arguments, adversaries and researchers are able to dynamically change permissions of the memory, in which malicious shellcode resides, to
RWX. With ACG, this is not possible.
Figure 2: A process protected by ACG
ACG prevents existing code, such as malicious shellcode that waits to be made
RWX, from being modified. If an individual has a read and a write primitive and has bypassed CFG and ASLR, ACG mitigates the ability to utilize ROP to bypass DEP via dynamically manipulating memory permissions.
Additionally, ACG prevents the ability to allocate new executable memory.
VirtualAlloc(), another popular API to return into for ROP, cannot allocate executable memory for malicious purposes. Essentially, memory cannot dynamically be changed to
ACG, although a user-mode mitigation, is implemented in the kernel through a Windows API function called
nt!MiArbitraryCodeBlocked. This function essentially checks a process to see if ACG is enabled.
nt!MiArbitraryCodeBlocked checks processes for the ACG mitigation
EPROCESS object for a process, which is the kernel’s representation of a process, has a member of the union data type known as
MitigationFlags that keeps track of the various mitigations enabled for the process.
EPROCESS also contains another member known as
MitigationFlagsValues that provides a human-readable variant of
Let’s examine an Edge content process (
MicrosoftEdgeCP.exe) where ACG is enabled.
Figure 4: MicrosoftEdgeCP.exe
MitigationFlagsValues, we can see that
DisableDynamicCode, which is ACG, is set to
— meaning ACG is enabled for this process.
DisableDynamicCode is set in an Edge content process
At this point, if dynamically created executable code is created for a process and this flag is set, a
STATUS_DYNAMIC_CODE_BLOCKED failure is returned from the function check, resulting in a crash.
Additionally, it is possible to obtain a list of all running processes that have ACG enabled, by parsing all of the
Figure 6: A list of processes with ACG enabled
Alex Ionescu explained in a talk at Ekoparty that prior to the 1703 (RS2) update, Edge had one thread responsible for JIT because of ACG. Since JIT isn’t compatible with ACG, this “JIT thread” did not have ACG enabled — meaning if compromising this thread was possible, it would then be possible to circumvent ACG. To address this, Microsoft created a separate process for Edge JIT compilation entirely in Windows 1703 (RS2). In order for an Edge Content process (a non-JIT process) to utilize JIT compilation, the JIT process utilizes a handle to an Edge Content process in order to perform JIT work inside of each non-JIT process.
ACG has a “universal bypass” in that researchers and adversaries can stay away from code execution entirely. By utilizing code reuse techniques, it is possible to write an entire payload in ROP, JOP or COP, which will “adhere” to ACG’s rules. Instead of using code reuse techniques to return into an API, an option would be to just use it to construct the entire payload. Additionally, compromised browsers will need to utilize a full code reuse sandbox escape. This is not ideal, as writing payloads in ROP, JOP or COP is very time-consuming.
ACG has also been bypassed using Edge’s JIT structure. Ivan Fratic of Google Project Zero gave a talk at Infiltrate 2018 explaining that the way Content processes of Edge obtain handles to the JIT process is risky.
An Edge Content process utilizes the Windows API function
DuplicateHandle() to create a handle to itself that the JIT process can utilize. The issue with this is that the
DuplicateHandle() function requires an already established handle to the target process with
PROCESS_DUP_HANDLE permissions. Content Edge processes utilize these permissions to obtain a handle to the JIT process with a great amount of access, as
PROCESS_DUP_HANDLE allows a process with a handle to another process to duplicate a pseudo handle (e.g., -1) that has maximum access. This would allow access to the JIT process from a Content Edge process where ACG is disabled. This could lead to a compromise of the system by utilizing Content Process to then pivot to the non-ACG-protected JIT process for exploitation.
These issues were eventually fixed in Windows 10 RS4, and obviously, Edge now uses the Chromium Engine, which is important to note also leverages ACG and an out-of-process JIT compiler.
Modern Mitigation #3: CET
Due to CFG not taking into account return edge cases, Microsoft needed to quickly develop a solution to protect return addresses. As mentioned by Joe Bialek of the Microsoft Security Response Center in his OffensiveCon 2018 talk, Microsoft initially addressed this problem with a software-based mitigation known as RFG, or Return Flow Guard.
RFG aimed to address the problem by utilizing additional code in function prologues to push the return address of a function onto something known as a “shadow stack,” which contains only copies of the legitimate return pointers for functions and does not hold any parameters. This shadow stack was not accessible from user mode and therefore “protected by the kernel.” In the epilogue of a function, the shadow stack’s copy of the return address was compared to the in-scope return address. If they were different, a crash would ensue. RFG, although a nice concept, was eventually defeated by Microsoft’s internal red team, which found a universal bypass that came down to the implementation of any shadow stack solution implemented in software. Due to the limitations of any software implementation of control-flow hijacking, a hardware-based solution was needed.
Enter Intel CET or Control-Flow Enforcement Technology. CET is a hardware-based mitigation that implements a shadow stack to protect return addresses on the stack, as well as forward edge cases such as calls/jumps through Indirect Branch Tracking (IBT). However, Microsoft has opted to use CFG (and XFG, which will be referenced later within this post) to protect forward edge cases instead of CET’s IBT capabilities, which works similarly to Clang’s CFI implementation, according to Alex Ionescu and Yarden Shafir.
CET’s main talking point is its protection of return addresses, essentially thwarting ROP. CET has a similar approach to RFG, in that a shadow stack is used.
When CET determines a target return address is a mismatch with its associated preserved return address on the shadow stack, a fault is generated.
Figure 7: A look at a “pseudo” check of a return address through CET
Although CET, which is a part of the Intel Tiger Lake CPU family, has not hit mainstream consumer hardware, some possible bypasses have been conceptualized.
Modern Mitigation #4: XFG
Xtended Control Flow Guard, popularized as XFG, is Microsoft’s “enhanced” implementation of CFG. By design, CFG only validates if functions exist in the CFG bitmap — meaning that technically if a function pointer was overwritten by another function that existed in the CFG bitmap, it would be a valid target. Figure 8 below shows
[nt!HalDispatchTable+0x8], which normally points to
hal!HaliQuerySystemInformation, has been overwritten with
[nt!HalDispatchTable+0x8] has been overwritten with
Just before reaching execution, the kCFG bitmap takes in the value of RAX, which will be
nt!RtlGetVersion instead of
[nt!HalDispatchTable+0x8], to determine if the function is valid or not.
Figure 9: Pointer to
nt!HalDispatchTable+0x8 is loaded into RAX in preparation for the call to
Figure 10: Actual value in RAX is
nt!RtlGetVersion, not the intended value of
The bitwise checks occur and the function call is still allowed to occur, even though
[nt!HalDispatchTable+0x8] has been overwritten with another function.
Figure 11: A
jmp to RAX occurs, which contains the preserved value of
Figure 12: Call to
Although CFG does thwart some indirect function calls to overwritten functions, it is still possible with crafted function calls to make calls with malign intent.
XFG addresses this lack of robustness, as mentioned by David Weston of Microsoft. In David’s talk at BlueHat Shanghai 2019, he explains that XFG implements a “type-based hash” of a protected function, which is placed 0x8 bytes above a call to one of the XFG dispatch functions.
XFG essentially takes the function prototype of a function, made up of the return value and function arguments, and creates a ~ 55-bit hash of the prototype. When the dispatch function is called, the function hash is placed 8 bytes above the function itself. This hash will be used as an additional check before control flow transfer.
Figure 13: An XFG hash is loaded into
R10 before control flow transfer to the XFG dispatch function
If an XFG function hash, which is generated by the compiler, is not vigorous and complete, hashes may not be unique. This means that If the sequence of bytes that makes up the hash is not unique, the opcodes that reside 8 bytes under the hash may contain the same bytes, when calling into the middle of a function, for instance. Although not likely, this may result in XFG declaring an overwritten function is “valid” because the comparison between the hash and the function, when disassembled into opcodes, may be true — resulting in XFG being bypassed. However, the compiler team has specifically implemented code to try to avoid this from happening. Similarly, because the hashing for C functions uses primitive types such as
void*, functions could potentially be overwritten with functions that have identical/similar prototypes.
Modern Mitigation #5: VBS and HVCI
In order to provide additional security boundaries for the Windows OS, Microsoft opted to utilize the existing virtualization capabilities of modern hardware. Among these mitigations are Hypervisor-Protected Code Integrity (HVCI) and Virtualization-Based Security (VBS).
VBS is responsible for enabling HVCI and is enabled by default on compatible hardware after Windows 10 1903 (19H1) on “Secured Core” systems. It can also be turned on by default on Windows 10 2003 (20H1) systems for vendors that opt-in through system configuration, and if the hardware is modern enough to conform to Microsoft’s “Security Level 3” baseline. VBS aims to isolate user-mode and kernel-mode code by having it run on top of the Hyper-V Hypervisor.
The following image from Windows Internals, Part 1, 7th Edition (Ionescu, et al.) outlines a high-level visual into the implementation of VBS.
Figure 14: VBS implementation (Windows Internals, Part 1, 7th Edition)
VTLs, or Virtual Trust Levels, prevent processes running in one VTL from accessing resources of another VTL. This is because resources located within the normal kernel are actually managed by a more “trusted” boundary — VTL 1.
One of the main components of VBS mentioned in this blog is HVCI. HVCI is essentially ACG in the kernel. HVCI thwarts dynamically created executable code in the kernel. Additionally, HVCI prevents allocating kernel pool memory that is RWX, similar to ACG’s user mode protection against RWX pages via
HVCI leverages Second Layer Address Translation, known as SLAT, to enforce Enhanced Page Tables, or EPTs, which are additional immutable bits (in context of VTL 0) that set VTL 1 permissions on VTL 0 pages. This means that even if an adversary or researcher can manipulate a PTE control bit in kernel mode of VTL 0, the VTL 1 EPT bits will still not permit execution of the manipulated pages in VTL 0 kernel mode.
Bypasses for HVCI could include techniques similar to ACG in data-only attacks. Staying away from executing code but instead utilizing code reuse techniques that don’t result in PTE manipulation or other forbidden actions is still a viable option. Additionally, if an adversary/researcher can leverage a vulnerability in the hypervisor, or in the secure kernel that operates in VTL1, it may be possible to compromise the integrity of VTL 1.
By no means are the vulnerability classes and mitigations in these two blog posts an exhaustive list. These aforementioned mitigations are commonly enabled by default on many installations on Windows and must at bare minimum be taken into consideration from an adversarial or research perspective.
Many adversaries commonly choose the “path of least resistance,” meaning sending a malicious document or a malicious HTA to an unsuspecting list of targeted users. Generally, this will be enough to get the job done. However, the counterpoint to that would be, does anything top a no-user-interaction, unauthenticated, remote kernel code execution exploit in a common service such as SMB, RDP, or DNS? Utilizing social engineering techniques relies on other uncontrollable factors such as security-aware end users who receive such phishing emails. Binary exploitation takes the people factor out of the code execution process, leaving less to worry about.
A researcher or adversary may spend weeks or months to develop a reliable, portable exploit that bypasses all of the mitigations in place. An exploit, such as a browser exploit, may require one user mode arbitrary read zero-day to bypass ASLR; an arbitrary write zero-day to bypass DEP, CFG, ACG and other mitigations; a kernel arbitrary read zero-day to bypass kASLR/page table randomization from a restricted caller to prep the kernel exploit to break out of the browser sandbox; and a kernel arbitrary write zero-day for the kernel exploit. That is a total of four zero-days. Is the return on investment worth it? These are the questions research firms and nation-state adversaries must take into consideration.
- What Makes It Page? The Windows 7 (x64) Virtual Memory Manager
- Windows Internals, Part 1, 7th Edition