An arbitrary code execution vulnerability exists in the DWARF parsing functionality of NVIDIA cuobjdump 12.8.55. A specially crafted fatbin file can lead to a arbitrary code execution. An attacker can provide a malicious file to trigger this vulnerability.
The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.
NVIDIA cuobjdump 12.8.55
cuobjdump - https://docs.nvidia.com/cuda/cuda-binary-utilities/index.html#cuobjdump
7.8 - CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE-121 - Stack-based Buffer Overflow
cuobjdump is a command-line utility included in the CUDA Toolkit provided by NVIDIA. Similar to the standard objdump
utility, it parses CUDA executable files and displays information like PTX disassembly, section headers, relocations etc.
cuobjdump
takes fatbin
files as input. A CUDA binary file (cubin
) is a custom ELF-like binary format that contains code compiled by the Nvidia CUDA compiler (nvcc
) for a specific CUDA architecture. fatbin
files can contain multiple cubin
files for compatibility with different CUDA devices.
The ELF-like cubin
files may contain debugging symbols implementing the standard .debug
sections from the DWARF specification [1]. The basic building block of DWARF information is the Debugging Information Entry or DIE. The general structure of a DIE is:
struct DIE {
leb128 index;
leb128 DW_TAG;
byte has_children;
struct attributes {
leb128 DW_AT;
leb128 DW_FORM
} children[];
}
Note that the data is in LEB128
encoding[2], a variable integer encoding used for binary size optimization.
The .debug_abbrev
section includes the abbreviation table complementing the DWARF debugging information in an ELF file. It consists of DEI entries. cuobjdump
parses these entries if we provide the --dump-elf
flag.
$ ./cuobjdump --dump-elf ./poc
...
64bit elf: type=12642, abi=7, sm=52, toolkit=126, flags = 0x340534
Sections:
Index Offset Size ES Align Type Flags Link Info Name
1 40 fb 0 ff0001 STRTAB 0 0 0 .shstrtab
2 100 10f 0 1 STRTAB 0 0 0 .strtab
...
5 31c 1178 0 4 40 3 9 .debug_abbrev
...
.section .debug_abbrev
Contents of the .debug_abbrev section:
Number TAG
4 0x33 DW_TAG_variant_part [has children]
1 0x30 DW_TAG_template_value_parameter [has no children]
42 0x00 [has no children]
Unknown Attribute value 4
(0x4) DW_FORM_block1(0xa)
The function at 0x0045fd00
is responsible for decoding a LEB128
integer back to a regular integer representation. At 0x0045fd10
we see the core logic of the decoding.
0045fd10 movzx r9d, byte [rdx]
0045fd14 add rdx, 0x1
0045fd18 mov r8, r9
0045fd1b and r8d, 0x7f
0045fd1f shl r8, cl
0045fd22 add ecx, 0x7
0045fd25 or rax, r8
0045fd28 mov r8d, edx
0045fd2b sub r8d, edi
0045fd2e test r9b, r9b
0045fd31 js 0x45fd10
0045fd38 mov dword [rsi], r8d
0045fd3b retn {__return_addr}
In short, the code reads a byte from the input, performs a binary AND with 0x7f
and a shift left with 0x7*i
, where i
is the current loop iteration. Then it performs a binary OR to RAX
saving the calculated value. If the byte is signed, or equivalently, larger than 0x80
, it continues reading the next byte. Else, it returns the calculated integer at RAX
and the number of bytes processed at RSI
.
A very interesting use of this decoding function takes place at offset 0x00443f20
where the .debug_abbrev
table is parsed.
The code iterates through the DW_FORM
and DW_AT
entries in the table in a loop.
00443f20 lea rsi, [rsp+0x5c {var_85c}]
00443f25 mov rdi, r12
00443f28 call 0045fd00 // Decode LEB128 from input (1)
00443f2d mov r15, rax
00443f30 movsxd rax, dword [rsp+0x5c {var_85c}]
...
00443f39 add r12, rax
...
00443f65 lea rsi, [rsp+0x5c {var_85c}]
00443f6a mov dword [rbp+r14*8-0x8 {var_840}], r13d (2)
00443f6f mov dword [rbp+r14*8-0x4 {var_840+0x4}], r15d (3)
...
00443f7a add r14, 0x1
00443f7e call 0045fd00 // Decode LEB128 from input (4)
00443f83 mov r13, rax
00443f86 movsxd rax, dword [rsp+0x5c {var_85c}]
00443f8b add r12, rax
00443f8e test r13, r13
00443f91 jne 0x443f20 (5)
At (1) and (4) the decoded LEB128 values are calculated and saved to R15
and R13
respectively. At (2) and (3), these decoded values are saved in the stack. Note that R14
is used as a counter in the loop and is used as an index to write the decoded values to the stack.
At (5), we see that the exit condition for the loop is the decoded value returned at (4). As long as the decoded integer is not zero, the loop will continue processing bytes and write to the stack. No checks are being employed to constrain the number of times this loop will execute. Obviously, with a properly crafted input, the loop can continue executing, overwriting the stack and leading to a classic stack buffer overflow.
In the function prologue, we see six 8-byte registers pushed on the stack and a memory region of 0x888
is allocated on the stack for local variables:
00443980 push r15 {__saved_r15}
00443982 push r14 {__saved_r14}
00443984 push r13 {__saved_r13}
00443986 push r12 {__saved_r12}
0044398b push rbp {__saved_rbp}
0044398c push rbx {__saved_rbx}
00443995 sub rsp, 0x888
As a result, from the current stack pointer, the offset of the return address saved on the stack is calculated as:
6*8 + 0x888 = 0x8b8
RBP
is set to:
00443a91 lea rbp, [rsp+0x80 {var_838}]
So RBP
is 0x80
bytes larger than RSP
. Since the memory store uses RBP
as a pointer, in order to overwrite the stack address, we have to overwrite
0x8b8 - 0x80 = 0x838 bytes
Remember that at the memory store at (2) we have a -8
added to the pointer, so the number of iterations we need to overwrite the return address is:
(0x838 / 8) + 1 = 0x108
And since each loop iteration processes 2
LEB128 integers, we need to provide 0x210
LEB128 encoded values in order to overwrite the stack.
The values written on the stack however will be decoded from the LEB128 representation. As such, an attacker has to encode the new return address in the LEB128 representation.
[1] https://dwarfstd.org/doc/DWARF5.pdf [2] https://en.wikipedia.org/wiki/LEB128
Running under gdb we see that we have full control of the return address with the POC:
$ gdb --args cuobjdump --dump-elf ./poc
...
Program received signal SIGSEGV, Segmentation fault.
0x0000000041414141 in ?? ()
────────────────────────────────────────────────────────────────────── registers ────
$rax : 0x0
$rbx : 0x3800000038
$rcx : 0x7
$rdx : 0x1
$rsp : 0x00007fffffffde40 _ 0x00000000006d3128 _ 0x00000000006ca898 _ 0x65722e766e2e0065 ("e"?)
$rbp : 0x3800000038
$rsi : 0x00007fffffffd5dc _ 0x0000000000000001
$rdi : 0x00000000006d672f _ 0x0000000000000000
$rip : 0x41414141
$r8 : 0x1
$r9 : 0x0
$r10 : 0x0
$r11 : 0x202
$r12 : 0x3200000031
$r13 : 0x3400000033
$r14 : 0x3600000035
$r15 : 0x198a00000037
$eflags: [zero carry parity adjust sign trap INTERRUPT direction overflow RESUME virtualx86 identification]
$cs: 0x33 $ss: 0x2b $ds: 0x00 $es: 0x00 $fs: 0x00 $gs: 0x00
──────────────────────────────────────────────────────────────────── code:x86:64 ────
[!] Cannot disassemble from $PC
2025-03-03 - Vendor Disclosure
2025-09-23 - Vendor Patch Release
2025-09-24 - Public Release
Discovered by Dimitrios Tatsis of Cisco Talos.