An out-of-bounds write vulnerability exists in the REL section header parsing functionality of NVIDIA nvdisasm 12.9.88. A specially crafted ELF file can lead to a arbitrary code execution. An attacker can provide a malicious file to trigger this vulnerability.
The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.
NVIDIA nvdisasm 12.9.88
nvdisasm - https://docs.nvidia.com/cuda/cuda-binary-utilities/index.html
7.8 - CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE-122 - Heap-based Buffer Overflow
The nvdisasm
tool provided by the Nvidia CUDA Toolkit is used to display information about CUDA ELF files. Apart from the disassembly of CUDA compiled binary code, it is capable of displaying control flow graphs, register life-ranges, debug information etc.
The nvdisasm
tool provided by the Nvidia CUDA Toolkit is used to display information about CUDA ELF files. Apart from the disassembly of CUDA compiled binary code, it is capable of displaying control flow graphs, register life-ranges, debug information etc.
The REL section of an ELF file contains relocation information needed when the binary is loaded into memory for execution. A section header for a section is described in the ELF specification. In our case, nvdisasm
handles 32-bit ELF files so we use the relevant definition:
typedef uint16 Elf32_Half;
typedef uint32 Elf32_Word;
typedef uint32 Elf32_Addr;
typedef uint32 Elf32_Off;
typedef struct {
Elf32_Word sh_name;
Elf32_Word sh_type;
Elf32_Word sh_flags;
Elf32_Addr sh_addr;
Elf32_Off sh_offset;
Elf32_Word sh_size;
Elf32_Word sh_link;
Elf32_Word sh_info;
Elf32_Word sh_addralign;
Elf32_Word sh_entsize;
} Elf32_Shdr;
nvdisasm
parses the REL sections of an ELF file in the function at 0x45a670. At offset 0x45bf69, the sh_size
for the current REL section is moved to the rsi
register. Then at (1) and (2), a shift-right with 3 and a shift-left with 4 is performed in order to calculate an allocation size parameter for the internal allocation function at (3). The instructions at (1) and (2) effectively discard the last 3 bits of the sh_size
and then multiply it by 2.
0045bf69 mov rsi, qword [rsp {var_b8_1}]
0045bf6d mov rdi, qword [rax+0x18]
0045bf71 shr rsi, 0x3 (1)
0045bf75 shl rsi, 0x4 (2)
0045bf79 call sub_410af0 (3)
The code then continues to calculate the size of the buffer again, at offset 0x045bfbf we have:
0045b94b lea r14, [rbx+rsi] (4.a)
...
0045bfbf lea rax, [r14+0x7] (4.b)
0045bfc3 lea rsi, [rbx+0x8] (4.c)
0045bfc7 sub rax, rsi (4.d)
Simplifying the sub operation at (4.d) by taking into account the previous instructions, we have:
buffer_size = rax - rsi = (r14 + 0x7) - (rbx + 0x8) = (rbx + rsi + 0x7) - (rbx + 0x8) = rsi - 1 = sh_size - 1
In short, the sh_size
is decremented by 1. Then, we have a series of shift operations:
0045bfca mov r8, rax
0045bfcd shr r8, 0x3
0045bfd1 add r8, 0x1
Simplifying, we have:
offset = (((sh_size - 1) >> 3) + 1)
Later, the code at 0x045c092
performs an AND operation to the offset
with the value 0xfffffffffffffffc
, effectively omitting the last 3 bits of the value at (6). At (7), this value is being shifted to the left by 4 and then added to the rcx
register at (8), which holds the pointer to the allocated buffer previously at (3).
0045c092 mov rax, r8 // offset
0045c095 and rax, 0xfffffffffffffffc (6)
0045c099 mov rdx, rax (7.a)
...
0045c0a0 shl rdx, 0x4 (7.b)
0045c0a4 add rcx, rdx (8)
0045c0a7 cmp rax, r8 (9)
0045c0aa je 0x45b960 (10)
Effectively what this code does is to add the size of the buffer to the pointer at (8). Then, if the offset before and after the AND operation at (6) is the same, the jump at (10) will be executed. Alternatively, if the comparison is false, meaning that the offset
value is not aligned with 4, execution will proceed to 0x45c0b0
.
0045c0b0 lea rsi, [rbx+0x8]
0045c0b4 mov eax, dword [rbx] (11)
0045c0b9 mov qword [rcx], rax (12)
At (11) the code copies a 4-byte value from the input to the rax
register and stores it to the memory location pointed by rcx
. Note however that this pointer was incremented by the total size of the allocated buffer previously at (8), meaning that rcx
points beyond the allocated buffer, leading to a heap out-of-bounds write with controlled data.
The code continues to store input data to various offsets using the rcx
register which leaves much room for successful exploitation.
0045c0ce mov qword [rcx+0x8], rax
...
0045c0d8 mov eax, dword [rbx+0x8]
0045c0db mov edx, dword [rbx+0xc]
0045c0de mov qword [rcx+0x10], rax
...
0045c0ee add rax, rdx
0045c0f1 mov qword [rcx+0x18], rax
As we saw earlier, in order for the vulnerable code to be reached, the jump instruction at (10) must not be executed. This means that for a REL section, the following condition must be true:
(((sh_size - 1) >> 3) + 1) & 0x3 != 0
It is trivial to write code that find the appropriate values for this condition:
#!/usr/bin/env python3
def calc_offset(sh_size):
return ((sh_size - 1) >> 3) + 1
count = 0
for sh_size in range(0xff):
offset = calc_offset(sh_size)
if (offset & 0x3) != 0:
count += 1
print(f"sh_size: {sh_size:#04x}, offset: {offset:#04x}")
print(f"Total valid sh_size vulnerable values: {count}")
Executing the code we get:
...
sh_size: 0x17, offset: 0x03
sh_size: 0x18, offset: 0x03
sh_size: 0x21, offset: 0x05
...
Using valgrind we get:
==141955== Memcheck, a memory error detector
==141955== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==141955== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==141955== Command: ./nvdisasm-12.9.88 --life-range-mode wide --print-instr-offsets-cfg --print-life-ranges --print-line-info --print-line-info-inline --print-line-info- +++ptx --sort-sections inputs/queue/id:002880,time:0,execs:0,orig:id:001568,src:001545+000635,time:4890772,execs:963632,op:splice,rep:4
==141955== Parent PID: 115760
==141955==
==141955== VALGRIND_ERROR_START
==141955== Invalid write of size 8
==141955== at 0x45C0B9: ??? (in /home/dtatsis/nvdisasm-fuzz/nvdisasm-12.9.88)
==141955== by 0x45C775: ??? (in /home/dtatsis/nvdisasm-fuzz/nvdisasm-12.9.88)
==141955== by 0x403041: ??? (in /home/dtatsis/nvdisasm-fuzz/nvdisasm-12.9.88)
==141955== by 0x488B249: (below main) (libc_start_call_main.h:58)
==141955== Address 0x4a55298 is 0 bytes after a block of size 328 alloc'd
==141955== at 0x48407B4: malloc (vg_replace_malloc.c:381)
==141955== by 0x45D619: ??? (in /home/dtatsis/nvdisasm-fuzz/nvdisasm-12.9.88)
==141955== by 0x410E79: ??? (in /home/dtatsis/nvdisasm-fuzz/nvdisasm-12.9.88)
==141955== by 0x45BF7D: ??? (in /home/dtatsis/nvdisasm-fuzz/nvdisasm-12.9.88)
==141955== by 0x45C775: ??? (in /home/dtatsis/nvdisasm-fuzz/nvdisasm-12.9.88)
==141955== by 0x403041: ??? (in /home/dtatsis/nvdisasm-fuzz/nvdisasm-12.9.88)
==141955== by 0x488B249: (below main) (libc_start_call_main.h:58)
==141955==
==141955== VALGRIND_ERROR_END
==141955== VALGRIND_ERROR_START
==141955== Invalid write of size 8
==141955== at 0x45C0CE: ??? (in /home/dtatsis/nvdisasm-fuzz/nvdisasm-12.9.88)
==141955== by 0x45C775: ??? (in /home/dtatsis/nvdisasm-fuzz/nvdisasm-12.9.88)
==141955== by 0x403041: ??? (in /home/dtatsis/nvdisasm-fuzz/nvdisasm-12.9.88)
==141955== by 0x488B249: (below main) (libc_start_call_main.h:58)
==141955== Address 0x4a552a0 is 8 bytes after a block of size 328 alloc'd
==141955== at 0x48407B4: malloc (vg_replace_malloc.c:381)
==141955== by 0x45D619: ??? (in /home/dtatsis/nvdisasm-fuzz/nvdisasm-12.9.88)
==141955== by 0x410E79: ??? (in /home/dtatsis/nvdisasm-fuzz/nvdisasm-12.9.88)
==141955== by 0x45BF7D: ??? (in /home/dtatsis/nvdisasm-fuzz/nvdisasm-12.9.88)
==141955== by 0x45C775: ??? (in /home/dtatsis/nvdisasm-fuzz/nvdisasm-12.9.88)
==141955== by 0x403041: ??? (in /home/dtatsis/nvdisasm-fuzz/nvdisasm-12.9.88)
==141955== by 0x488B249: (below main) (libc_start_call_main.h:58)
==141955==
==141955== VALGRIND_ERROR_END
==141955==
==141955== HEAP SUMMARY:
==141955== in use at exit: 26,737 bytes in 153 blocks
==141955== total heap usage: 244 allocs, 91 frees, 42,944 bytes allocated
==141955==
==141955== For a detailed leak analysis, rerun with: --leak-check=full
==141955==
==141955== For lists of detected and suppressed errors, rerun with: -s
==141955== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
2025-06-30 - Vendor Disclosure
2025-09-23 - Vendor Patch Release
2025-09-24 - Public Release
Discovered by Dimitrios Tatsis of Cisco Talos.