NVIDIA cuobjdump DWARF debug abbreviations parsing arbitrary code execution vulnerability

NVIDIA cuobjdump DWARF debug abbreviations parsing arbitrary code execution vulnerability
NVIDIA cuobjdump 12.8.55 存在任意代码执行漏洞，源于DWARF解析中的堆栈溢出问题。攻击者可通过特制fatbin文件触发漏洞。CVSSv3评分为7.8，CWE为CWE-121。 2025-9-24 00:1:0 Author: talosintelligence.com(查看原文) 阅读量:8 收藏

SUMMARY

An arbitrary code execution vulnerability exists in the DWARF parsing functionality of NVIDIA cuobjdump 12.8.55. A specially crafted fatbin file can lead to a arbitrary code execution. An attacker can provide a malicious file to trigger this vulnerability.

CONFIRMED VULNERABLE VERSIONS

The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.

NVIDIA cuobjdump 12.8.55

PRODUCT URLS

cuobjdump - https://docs.nvidia.com/cuda/cuda-binary-utilities/index.html#cuobjdump

CVSSv3 SCORE

7.8 - CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

CWE

CWE-121 - Stack-based Buffer Overflow

DETAILS

cuobjdump is a command-line utility included in the CUDA Toolkit provided by NVIDIA. Similar to the standard objdump utility, it parses CUDA executable files and displays information like PTX disassembly, section headers, relocations etc.

cuobjdump takes fatbin files as input. A CUDA binary file (cubin) is a custom ELF-like binary format that contains code compiled by the Nvidia CUDA compiler (nvcc) for a specific CUDA architecture. fatbin files can contain multiple cubin files for compatibility with different CUDA devices.

The ELF-like cubin files may contain debugging symbols implementing the standard .debug sections from the DWARF specification [1]. The basic building block of DWARF information is the Debugging Information Entry or DIE. The general structure of a DIE is:

struct DIE {
    leb128 index;
    leb128 DW_TAG;
    byte has_children;

    struct attributes {
        leb128 DW_AT;
        leb128 DW_FORM
    } children[];
}

Note that the data is in LEB128 encoding[2], a variable integer encoding used for binary size optimization.

The .debug_abbrev section includes the abbreviation table complementing the DWARF debugging information in an ELF file. It consists of DEI entries. cuobjdump parses these entries if we provide the --dump-elf flag.

$ ./cuobjdump --dump-elf ./poc
... 
64bit elf: type=12642, abi=7, sm=52, toolkit=126, flags = 0x340534
Sections:
Index Offset   Size ES Align    Type    Flags Link     Info Name
    1     40     fb  0 ff0001   STRTAB        0    0        0 .shstrtab
    2    100    10f  0  1       STRTAB        0    0        0 .strtab
    ...
    5    31c   1178  0  4                    40    3        9 .debug_abbrev
...
.section .debug_abbrev
Contents of the .debug_abbrev section:
  Number  TAG
   4      0x33 DW_TAG_variant_part      [has children]
   1      0x30 DW_TAG_template_value_parameter  [has no children]
   42      0x00       [has no children]
Unknown Attribute value 4
   (0x4)          DW_FORM_block1(0xa)

The function at 0x0045fd00 is responsible for decoding a LEB128 integer back to a regular integer representation. At 0x0045fd10 we see the core logic of the decoding.

0045fd10  movzx   r9d, byte [rdx]
0045fd14  add     rdx, 0x1
0045fd18  mov     r8, r9
0045fd1b  and     r8d, 0x7f
0045fd1f  shl     r8, cl
0045fd22  add     ecx, 0x7
0045fd25  or      rax, r8
0045fd28  mov     r8d, edx
0045fd2b  sub     r8d, edi
0045fd2e  test    r9b, r9b
0045fd31  js      0x45fd10

0045fd38  mov     dword [rsi], r8d
0045fd3b  retn     {__return_addr}

In short, the code reads a byte from the input, performs a binary AND with 0x7f and a shift left with 0x7*i, where i is the current loop iteration. Then it performs a binary OR to RAX saving the calculated value. If the byte is signed, or equivalently, larger than 0x80, it continues reading the next byte. Else, it returns the calculated integer at RAX and the number of bytes processed at RSI.

A very interesting use of this decoding function takes place at offset 0x00443f20 where the .debug_abbrev table is parsed.

The code iterates through the DW_FORM and DW_AT entries in the table in a loop.

00443f20  lea     rsi, [rsp+0x5c {var_85c}]
00443f25  mov     rdi, r12
00443f28  call    0045fd00                        // Decode LEB128 from input    (1)
00443f2d  mov     r15, rax
00443f30  movsxd  rax, dword [rsp+0x5c {var_85c}]
...
00443f39  add     r12, rax
...
00443f65  lea     rsi, [rsp+0x5c {var_85c}]
00443f6a  mov     dword [rbp+r14*8-0x8 {var_840}], r13d                          (2)
00443f6f  mov     dword [rbp+r14*8-0x4 {var_840+0x4}], r15d                      (3)
...
00443f7a  add     r14, 0x1
00443f7e  call    0045fd00                        // Decode LEB128 from input    (4)
00443f83  mov     r13, rax
00443f86  movsxd  rax, dword [rsp+0x5c {var_85c}]
00443f8b  add     r12, rax
00443f8e  test    r13, r13
00443f91  jne     0x443f20                                                       (5)

At (1) and (4) the decoded LEB128 values are calculated and saved to R15 and R13 respectively. At (2) and (3), these decoded values are saved in the stack. Note that R14 is used as a counter in the loop and is used as an index to write the decoded values to the stack.

At (5), we see that the exit condition for the loop is the decoded value returned at (4). As long as the decoded integer is not zero, the loop will continue processing bytes and write to the stack. No checks are being employed to constrain the number of times this loop will execute. Obviously, with a properly crafted input, the loop can continue executing, overwriting the stack and leading to a classic stack buffer overflow.

In the function prologue, we see six 8-byte registers pushed on the stack and a memory region of 0x888 is allocated on the stack for local variables:

00443980  push    r15 {__saved_r15}
00443982  push    r14 {__saved_r14}
00443984  push    r13 {__saved_r13}
00443986  push    r12 {__saved_r12}
0044398b  push    rbp {__saved_rbp}
0044398c  push    rbx {__saved_rbx}
00443995  sub     rsp, 0x888

As a result, from the current stack pointer, the offset of the return address saved on the stack is calculated as:

6*8 + 0x888 = 0x8b8

RBP is set to:

00443a91  lea     rbp, [rsp+0x80 {var_838}]

So RBP is 0x80 bytes larger than RSP. Since the memory store uses RBP as a pointer, in order to overwrite the stack address, we have to overwrite

0x8b8 - 0x80 = 0x838 bytes

Remember that at the memory store at (2) we have a -8 added to the pointer, so the number of iterations we need to overwrite the return address is:

(0x838 / 8) + 1 = 0x108

And since each loop iteration processes 2 LEB128 integers, we need to provide 0x210 LEB128 encoded values in order to overwrite the stack.

The values written on the stack however will be decoded from the LEB128 representation. As such, an attacker has to encode the new return address in the LEB128 representation.

References

[1] https://dwarfstd.org/doc/DWARF5.pdf [2] https://en.wikipedia.org/wiki/LEB128

Crash Information

Running under gdb we see that we have full control of the return address with the POC:

$ gdb --args cuobjdump --dump-elf ./poc
...
Program received signal SIGSEGV, Segmentation fault.                                  
0x0000000041414141 in ?? ()                                                           
────────────────────────────────────────────────────────────────────── registers ──── 
$rax   : 0x0                                                                          
$rbx   : 0x3800000038                                                                 
$rcx   : 0x7                                                                          
$rdx   : 0x1                                                                          
$rsp   : 0x00007fffffffde40  _  0x00000000006d3128  _  0x00000000006ca898  _  0x65722e766e2e0065 ("e"?)                                                                     
$rbp   : 0x3800000038                                                                 
$rsi   : 0x00007fffffffd5dc  _  0x0000000000000001                                    
$rdi   : 0x00000000006d672f  _  0x0000000000000000                                    
$rip   : 0x41414141                                                                   
$r8    : 0x1                                                                          
$r9    : 0x0                                                                          
$r10   : 0x0                                                                          
$r11   : 0x202                                                                        
$r12   : 0x3200000031                                                                 
$r13   : 0x3400000033                                                                 
$r14   : 0x3600000035                                                                 
$r15   : 0x198a00000037                                                               
$eflags: [zero carry parity adjust sign trap INTERRUPT direction overflow RESUME virtualx86 identification]                                                                 
$cs: 0x33 $ss: 0x2b $ds: 0x00 $es: 0x00 $fs: 0x00 $gs: 0x00                           
──────────────────────────────────────────────────────────────────── code:x86:64 ──── 
[!] Cannot disassemble from $PC

TIMELINE

2025-03-03 - Vendor Disclosure
2025-09-23 - Vendor Patch Release
2025-09-24 - Public Release

Discovered by Dimitrios Tatsis of Cisco Talos.

文章来源: https://talosintelligence.com/vulnerability_reports/TALOS-2025-2155
如有侵权请联系:admin#unsafe.sh