NVIDIA nvdisasm REL section header parsing heap-based buffer overflow vulnerability

NVIDIA nvdisasm REL section header parsing heap-based buffer overflow vulnerability
NVIDIA nvdisasm 12.8.90版本中存在堆基缓冲区溢出漏洞，攻击者可通过特制ELF文件触发该漏洞并实现任意代码执行。 2025-9-24 00:1:0 Author: talosintelligence.com(查看原文) 阅读量:7 收藏

SUMMARY

A heap-based buffer overflow vulnerability exists in the REL section header parsing functionality of NVIDIA nvdisasm 12.8.90. A specially crafted ELF file can lead to arbitrary code execution. An attacker can provide a malicious file to trigger this vulnerability.

CONFIRMED VULNERABLE VERSIONS

The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.

NVIDIA nvdisasm 12.8.90

PRODUCT URLS

nvdisasm - https://docs.nvidia.com/cuda/cuda-binary-utilities/index.html

CVSSv3 SCORE

7.8 - CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

CWE

CWE-122 - Heap-based Buffer Overflow

DETAILS

The nvdisasm tool provided by the Nvidia CUDA Toolkit is used to display information about CUDA ELF files. Apart from the disassembly of CUDA compiled binary code, it is capable of displaying control flow graphs, register life-ranges, debug information etc.

The REL section of an ELF file contains relocation information needed when the binary is loaded into memory for execution. A section header for a section is described in the ELF specification. In our case, nvdisasm handles 32-bit ELF files so we use the relevant definition:

typedef uint16 Elf32_Half;
typedef uint32 Elf32_Word;
typedef uint32 Elf32_Addr;
typedef uint32 Elf32_Off;

typedef struct {
    Elf32_Word sh_name;
    Elf32_Word sh_type;
    Elf32_Word sh_flags;
    Elf32_Addr sh_addr;
    Elf32_Off  sh_offset;
    Elf32_Word sh_size;
    Elf32_Word sh_link;
    Elf32_Word sh_info;
    Elf32_Word sh_addralign;
    Elf32_Word sh_entsize;
} Elf32_Shdr;

nvdisasm parses the REL sections of an ELF file in the function at 0x45ada0. At offset 0x45c699, the sh_size for the current REL section is moved to the rsi register. Then at (1) and (2), a shift-right with 3 and a shift-left with 4 is performed in order to calculate an allocation size parameter for the internal allocation function at (3). The instructions at (1) and (2) effectively discard the last 3 bits of the sh_size and then multiply it by 2.

0045c699  mov     rsi, qword [rsp {var_b8_1}]
0045c69d  mov     rdi, qword [rax+0x18]
0045c6a1  shr     rsi, 0x3                     (1) 
0045c6a5  shl     rsi, 0x4                     (2)
0045c6a9  call    sub_40a610                   (3)

The code then continues to calculate a loop counter for a set of vectorized shuffle operations. Here rsi is the sh_size field of a REL section header at (4).

0045c07b  lea     r14, [rbx+rsi]               (4.a)
...
0045c6ef  lea     rax, [r14+0x7]               (4.b)
0045c6f3  lea     rsi, [rbx+0x8]               (4.c)
0045c6f7  sub     rax, rsi                     (4.d)

Simplifying the sub operation at (4.d) by taking into account the previous instructions, we have:

buffer_size = rax - rsi = (r14 + 0x7) - (rbx + 0x8) = (rbx + rsi + 0x7) - (rbx + 0x8) = rsi - 1 = sh_size - 1

In short, the sh_size is decremented by 1. Then, we have a series of shift operations:

0045c6fa  mov     r8, rax      (5.a)
0045c6fd  shr     r8, 0x3      (5.b)
0045c701  add     r8, 0x1      (5.c)
...
0045c70f  mov     rdi, r8      (5.d)
...
0045c718  shr     rdi, 0x2     (5.e)

As a result, the loop counter is calculated from the above instruction as:

loop_counter = (((sh_size-1) >> 3) + 1) >> 2

Then the code enters the loop where it performs a series of AVX shuffle operations.

0045c720  movdqu  xmm0, xmmword [rdx+0x10]
0045c725  pxor    xmm6, xmm6
0045c729  add     rsi, 0x1                    (6)
0045c72d  add     rdx, 0x20                   (7)
0045c731  add     rax, 0x40                   (8)
...
0045c79d  movups  xmmword [rax-0x30], xmm5    (9.a)
0045c7a1  movups  xmmword [rax-0x40], xmm1    (9.b)
...
0045c7b1  movups  xmmword [rax-0x10], xmm4    (9.c)
0045c7b5  movups  xmmword [rax-0x20], xmm1    (9.d)
0045c7b9  cmp     rdi, rsi
0045c7bc  ja      0x45c720

At (6), the current iteration counter is incremented. At (7), the rdx register used as a source for data is incremented by 0x20. Interestingly, here the rdx register points to data from the input file. At (8), the rax register used as a write destination is incremented by 0x40, implying that in each iteration, 0x40 bytes are written to memory at the address pointed by rax. Indeed, we can see that in the instructions (9.a) to (9.d), a total of 0x40 bytes are written to rax, since the size of an xmm register is 0x10 bytes.

If we take into account the size calculation for the heap allocated buffer and the calculated number of iterations that will be performed, it is easy to see that there is discrepancy between the size of the allocated buffer and the assumed size that the code calculates. Note that from (1) and (2), the actual allocation size is:

buffer_size = (sh_size >> 3) << 4

Whereas the calculated size from the operations at the instructions at (5) is:

loop_counter = (((sh_size-1) >> 3) + 1) >> 2
assumed_size = loop_counter * 0x40

If the assumed size of the buffer is larger than the actual size of the allocated buffer, a heap buffer overflow can occur. Indeed, it is easy to see that there are many cases where for a properly selected sh_size, this condition can be true:

  #!/usr/bin/env python3
  
  for sh_size in range(256):
      buf_size = (sh_size >> 3) << 4
      loop_counter = (((sh_size-1) >> 3) + 1) >> 2
      
      assumed = loop_counter*0x40
  
      if assumed > buf_size:
          print(f"sh_size: {hex(sh_size)}, buffer size: {hex(buf_size)}, assumed size: {hex(assumed)}")

Using the above very simple code in Python, we see:

sh_size: 0x19, buffer size: 0x30, assumed size: 0x40
sh_size: 0x1a, buffer size: 0x30, assumed size: 0x40
sh_size: 0x1b, buffer size: 0x30, assumed size: 0x40
sh_size: 0x1c, buffer size: 0x30, assumed size: 0x40
sh_size: 0x1d, buffer size: 0x30, assumed size: 0x40
sh_size: 0x1e, buffer size: 0x30, assumed size: 0x40
sh_size: 0x1f, buffer size: 0x30, assumed size: 0x40
sh_size: 0x39, buffer size: 0x70, assumed size: 0x80
...

As a result, with proper selection of the sh_size field of a REL section header in an ELF file, an attacker can perform a heap buffer-overflow with controlled data.

Crash Information

==152542== Memcheck, a memory error detector
==152542== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al. 
==152542== Using Valgrind-3.23.0 and LibVEX; rerun with -h for copyright info
==152542== Command: ./nvdisasm-12.8.90 ./ac0bcb07
==152542== Parent PID: 132852
==152542== 
==152542== Invalid write of size 8
==152542==    at 0x45C7B1: ??? (in /home/dtatsis/nvdisasm/trizzle-new/0x444961-write8-a572c03c/nvdisasm-12.8.90)
==152542==    by 0x45CEA5: ??? (in /home/dtatsis/nvdisasm/trizzle-new/0x444961-write8-a572c03c/nvdisasm-12.8.90)
==152542==    by 0x40307A: ??? (in /home/dtatsis/nvdisasm/trizzle-new/0x444961-write8-a572c03c/nvdisasm-12.8.90)
==152542==    by 0x48A21C9: (below main) (libc_start_call_main.h:58)
==152542==  Address 0x4a99368 is 0 bytes after a block of size 184 alloc'd
==152542==    at 0x484680F: malloc (vg_replace_malloc.c:446)
==152542==    by 0x465719: ??? (in /home/dtatsis/nvdisasm/trizzle-new/0x444961-write8-a572c03c/nvdisasm-12.8.90)
==152542==    by 0x40A999: ??? (in /home/dtatsis/nvdisasm/trizzle-new/0x444961-write8-a572c03c/nvdisasm-12.8.90)
==152542==    by 0x45C6AD: ??? (in /home/dtatsis/nvdisasm/trizzle-new/0x444961-write8-a572c03c/nvdisasm-12.8.90)
==152542==    by 0x45CEA5: ??? (in /home/dtatsis/nvdisasm/trizzle-new/0x444961-write8-a572c03c/nvdisasm-12.8.90)
==152542==    by 0x40307A: ??? (in /home/dtatsis/nvdisasm/trizzle-new/0x444961-write8-a572c03c/nvdisasm-12.8.90)
==152542==    by 0x48A21C9: (below main) (libc_start_call_main.h:58)
==152542== 
==152542== 
==152542== HEAP SUMMARY:
==152542==     in use at exit: 24,556 bytes in 154 blocks
==152542==   total heap usage: 228 allocs, 74 frees, 39,566 bytes allocated
==152542== 
==152542== LEAK SUMMARY:
==152542==    definitely lost: 248 bytes in 8 blocks
==152542==    indirectly lost: 8,062 bytes in 123 blocks
==152542==      possibly lost: 16,246 bytes in 23 blocks
==152542==    still reachable: 0 bytes in 0 blocks
==152542==         suppressed: 0 bytes in 0 blocks
==152542== Rerun with --leak-check=full to see details of leaked memory
==152542== 
==152542== For lists of detected and suppressed errors, rerun with: -s
==152542== ERROR SUMMARY: 2 errors from 1 contexts (suppressed: 0 from 0)

TIMELINE

2025-05-06 - Vendor Disclosure
2025-09-23 - Vendor Patch Release
2025-09-24 - Public Release

Discovered by Dimitrios Tatsis of Cisco Talos.

文章来源: https://talosintelligence.com/vulnerability_reports/TALOS-2025-2191
如有侵权请联系:admin#unsafe.sh