A heap-based buffer overflow vulnerability exists in the REL section header parsing functionality of NVIDIA nvdisasm 12.8.90. A specially crafted ELF file can lead to arbitrary code execution. An attacker can provide a malicious file to trigger this vulnerability.
The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.
NVIDIA nvdisasm 12.8.90
nvdisasm - https://docs.nvidia.com/cuda/cuda-binary-utilities/index.html
7.8 - CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE-122 - Heap-based Buffer Overflow
The nvdisasm
tool provided by the Nvidia CUDA Toolkit is used to display information about CUDA ELF files. Apart from the disassembly of CUDA compiled binary code, it is capable of displaying control flow graphs, register life-ranges, debug information etc.
The REL
section of an ELF file contains relocation information needed when the binary is loaded into memory for execution. A section header for a section is described in the ELF specification. In our case, nvdisasm
handles 32-bit ELF files so we use the relevant definition:
typedef uint16 Elf32_Half;
typedef uint32 Elf32_Word;
typedef uint32 Elf32_Addr;
typedef uint32 Elf32_Off;
typedef struct {
Elf32_Word sh_name;
Elf32_Word sh_type;
Elf32_Word sh_flags;
Elf32_Addr sh_addr;
Elf32_Off sh_offset;
Elf32_Word sh_size;
Elf32_Word sh_link;
Elf32_Word sh_info;
Elf32_Word sh_addralign;
Elf32_Word sh_entsize;
} Elf32_Shdr;
nvdisasm
parses the REL
sections of an ELF file in the function at 0x45ada0
. At offset 0x45c699
, the sh_size
for the current REL
section is moved to the rsi
register. Then at (1) and (2), a shift-right with 3
and a shift-left with 4
is performed in order to calculate an allocation size parameter for the internal allocation function at (3). The instructions at (1) and (2) effectively discard the last 3
bits of the sh_size
and then multiply it by 2
.
0045c699 mov rsi, qword [rsp {var_b8_1}]
0045c69d mov rdi, qword [rax+0x18]
0045c6a1 shr rsi, 0x3 (1)
0045c6a5 shl rsi, 0x4 (2)
0045c6a9 call sub_40a610 (3)
The code then continues to calculate a loop counter for a set of vectorized shuffle operations. Here rsi
is the sh_size
field of a REL
section header at (4).
0045c07b lea r14, [rbx+rsi] (4.a)
...
0045c6ef lea rax, [r14+0x7] (4.b)
0045c6f3 lea rsi, [rbx+0x8] (4.c)
0045c6f7 sub rax, rsi (4.d)
Simplifying the sub
operation at (4.d) by taking into account the previous instructions, we have:
buffer_size = rax - rsi = (r14 + 0x7) - (rbx + 0x8) = (rbx + rsi + 0x7) - (rbx + 0x8) = rsi - 1 = sh_size - 1
In short, the sh_size
is decremented by 1. Then, we have a series of shift operations:
0045c6fa mov r8, rax (5.a)
0045c6fd shr r8, 0x3 (5.b)
0045c701 add r8, 0x1 (5.c)
...
0045c70f mov rdi, r8 (5.d)
...
0045c718 shr rdi, 0x2 (5.e)
As a result, the loop counter is calculated from the above instruction as:
loop_counter = (((sh_size-1) >> 3) + 1) >> 2
Then the code enters the loop where it performs a series of AVX shuffle operations.
0045c720 movdqu xmm0, xmmword [rdx+0x10]
0045c725 pxor xmm6, xmm6
0045c729 add rsi, 0x1 (6)
0045c72d add rdx, 0x20 (7)
0045c731 add rax, 0x40 (8)
...
0045c79d movups xmmword [rax-0x30], xmm5 (9.a)
0045c7a1 movups xmmword [rax-0x40], xmm1 (9.b)
...
0045c7b1 movups xmmword [rax-0x10], xmm4 (9.c)
0045c7b5 movups xmmword [rax-0x20], xmm1 (9.d)
0045c7b9 cmp rdi, rsi
0045c7bc ja 0x45c720
At (6), the current iteration counter is incremented. At (7), the rdx
register used as a source for data is incremented by 0x20
. Interestingly, here the rdx
register points to data from the input file. At (8), the rax
register used as a write destination is incremented by 0x40
, implying that in each iteration, 0x40
bytes are written to memory at the address pointed by rax
. Indeed, we can see that in the instructions (9.a) to (9.d), a total of 0x40
bytes are written to rax
, since the size of an xmm
register is 0x10
bytes.
If we take into account the size calculation for the heap allocated buffer and the calculated number of iterations that will be performed, it is easy to see that there is discrepancy between the size of the allocated buffer and the assumed size that the code calculates. Note that from (1) and (2), the actual allocation size is:
buffer_size = (sh_size >> 3) << 4
Whereas the calculated size from the operations at the instructions at (5) is:
loop_counter = (((sh_size-1) >> 3) + 1) >> 2
assumed_size = loop_counter * 0x40
If the assumed size of the buffer is larger than the actual size of the allocated buffer, a heap buffer overflow can occur. Indeed, it is easy to see that there are many cases where for a properly selected sh_size
, this condition can be true:
#!/usr/bin/env python3
for sh_size in range(256):
buf_size = (sh_size >> 3) << 4
loop_counter = (((sh_size-1) >> 3) + 1) >> 2
assumed = loop_counter*0x40
if assumed > buf_size:
print(f"sh_size: {hex(sh_size)}, buffer size: {hex(buf_size)}, assumed size: {hex(assumed)}")
Using the above very simple code in Python, we see:
sh_size: 0x19, buffer size: 0x30, assumed size: 0x40
sh_size: 0x1a, buffer size: 0x30, assumed size: 0x40
sh_size: 0x1b, buffer size: 0x30, assumed size: 0x40
sh_size: 0x1c, buffer size: 0x30, assumed size: 0x40
sh_size: 0x1d, buffer size: 0x30, assumed size: 0x40
sh_size: 0x1e, buffer size: 0x30, assumed size: 0x40
sh_size: 0x1f, buffer size: 0x30, assumed size: 0x40
sh_size: 0x39, buffer size: 0x70, assumed size: 0x80
...
As a result, with proper selection of the sh_size
field of a REL
section header in an ELF file, an attacker can perform a heap buffer-overflow with controlled data.
==152542== Memcheck, a memory error detector
==152542== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==152542== Using Valgrind-3.23.0 and LibVEX; rerun with -h for copyright info
==152542== Command: ./nvdisasm-12.8.90 ./ac0bcb07
==152542== Parent PID: 132852
==152542==
==152542== Invalid write of size 8
==152542== at 0x45C7B1: ??? (in /home/dtatsis/nvdisasm/trizzle-new/0x444961-write8-a572c03c/nvdisasm-12.8.90)
==152542== by 0x45CEA5: ??? (in /home/dtatsis/nvdisasm/trizzle-new/0x444961-write8-a572c03c/nvdisasm-12.8.90)
==152542== by 0x40307A: ??? (in /home/dtatsis/nvdisasm/trizzle-new/0x444961-write8-a572c03c/nvdisasm-12.8.90)
==152542== by 0x48A21C9: (below main) (libc_start_call_main.h:58)
==152542== Address 0x4a99368 is 0 bytes after a block of size 184 alloc'd
==152542== at 0x484680F: malloc (vg_replace_malloc.c:446)
==152542== by 0x465719: ??? (in /home/dtatsis/nvdisasm/trizzle-new/0x444961-write8-a572c03c/nvdisasm-12.8.90)
==152542== by 0x40A999: ??? (in /home/dtatsis/nvdisasm/trizzle-new/0x444961-write8-a572c03c/nvdisasm-12.8.90)
==152542== by 0x45C6AD: ??? (in /home/dtatsis/nvdisasm/trizzle-new/0x444961-write8-a572c03c/nvdisasm-12.8.90)
==152542== by 0x45CEA5: ??? (in /home/dtatsis/nvdisasm/trizzle-new/0x444961-write8-a572c03c/nvdisasm-12.8.90)
==152542== by 0x40307A: ??? (in /home/dtatsis/nvdisasm/trizzle-new/0x444961-write8-a572c03c/nvdisasm-12.8.90)
==152542== by 0x48A21C9: (below main) (libc_start_call_main.h:58)
==152542==
==152542==
==152542== HEAP SUMMARY:
==152542== in use at exit: 24,556 bytes in 154 blocks
==152542== total heap usage: 228 allocs, 74 frees, 39,566 bytes allocated
==152542==
==152542== LEAK SUMMARY:
==152542== definitely lost: 248 bytes in 8 blocks
==152542== indirectly lost: 8,062 bytes in 123 blocks
==152542== possibly lost: 16,246 bytes in 23 blocks
==152542== still reachable: 0 bytes in 0 blocks
==152542== suppressed: 0 bytes in 0 blocks
==152542== Rerun with --leak-check=full to see details of leaked memory
==152542==
==152542== For lists of detected and suppressed errors, rerun with: -s
==152542== ERROR SUMMARY: 2 errors from 1 contexts (suppressed: 0 from 0)
2025-05-06 - Vendor Disclosure
2025-09-23 - Vendor Patch Release
2025-09-24 - Public Release
Discovered by Dimitrios Tatsis of Cisco Talos.