By Seth Jenkins, Project Zero
In December 2022, Google’s Threat Analysis Group (TAG) discovered an in-the-wild exploit chain targeting Samsung Android devices. TAG’s blog post covers the targeting and the actor behind the campaign. This is a technical analysis of the final stage of one of the exploit chains, specifically CVE-2023-0266 (a 0-day in the ALSA compatibility layer) and CVE-2023-26083 (a 0-day in the Mali GPU driver) as well as the techniques used by the attacker to gain kernel arbitrary read/write access. Notably, several of the previous stages of the exploit chain used n-day vulnerabilities: CVE-2022-4262, a 0-day vulnerability in Chrome was exploited in the Samsung browser to achieve RCE. CVE-2022-3038, a Chrome n-day that unpatched in the Samsung browser, was used to escape the Samsung browser sandbox. CVE-2022-22706, a Mali n-day, was used to achieve higher-level userland privileges. While that bug had been patched by Arm in January of 2022, the patch had not been downstreamed into Samsung devices at the point that the exploit chain was discovered. We now pick up the thread after the attacker has achieved execution as system_server. The exploit continues with a race condition in the kernel Advanced Linux Sound Architecture (ALSA) driver, CVE-2023-0266. 64-bit Android kernels support 32-bit syscall calling conventions in order to maintain compatibility with 32-bit programs and apps. As part of this compatibility layer, the kernel maintains code to translate 32-bit system calls into a format understandable by the rest of the 64-bit kernel code. In many cases, the code backing this compatibility layer simply wraps the 64-bit system calls with no extra logic, but in some cases important behaviors are re-implemented in the compatibility layer’s code. Such duplication increases the potential for bugs, as the compatibility layer can be forgotten while making consequential changes. In 2017, there was a refactoring in the ALSA driver to move the lock acquisition out of snd_ctl_elem_{write|read}() functions and further up the call graph for the SNDRV_CTL_IOCTL_ELEM_{READ|WRITE} ioctls. However, this commit only addressed the 64-bit ioctl code, introducing a race condition into the 32-bit compatibility layer SNDRV_CTL_IOCTL_ELEM_{READ|WRITE}32 ioctls. The 32-bit and 64-bit ioctls differ until they both call snd_ctl_elem_{write|read} so when the lock was moved up the 64-bit call chain, it was entirely removed from the 32-bit ioctls. Here’s the code path for SNDRV_CTL_IOCTL_ELEM_WRITE in 64-bit mode on kernel 5.10.107 post-refactor: snd_ctl_ioctl snd_ctl_elem_write_user [takes controls_rwsem] snd_ctl_elem_write [lock properly held, all good] [drops controls_rwsem] And here is the code path for that same ioctl called in 32-bit mode: snd_ctl_ioctl_compat snd_ctl_elem_write_user_compat ctl_elem_write_user snd_ctl_elem_write [missing lock, not good] In March 2021, these missing locks were added to SNDRV_CTL_IOCTL_ELEM_WRITE32 in upstream commit 1fa4445f9adf1 when the locks were moved back from snd_ctl_elem_write_user in to snd_ctl_elem_write in what was supposed to be an inconsequential refactor. This change accidentally fixed the SNDRV_CTL_IOCTL_ELEM_WRITE32 half of the bug. However this commit was never backported or merged into the Android kernel as its security impact was not identified and thus was able to be exploited in-the-wild in December 2022. The SNDRV_CTL_IOCTL_ELEM_READ32 call also remained unpatched until January 2023 when the in-the-wild exploit was discovered. Most exploits take advantage of such a classical UAF condition by reclaiming the virtual memory backing the freed object with attacker controlled data, and this exploit is no exception. Interestingly the attacker used Mali GPU driver features to perform the reclaim technique, despite the primary memory corruption bug being agnostic to the GPU used by the device. By creating many REQ_SOFT_JIT_FREE jobs which are gated behind a BASE_JD_REQ_SOFT_EVENT_WAIT, the attacker can take advantage of the associated kmalloc_array/copy_to_user calls in kbase_jit_free_prepare to create a powerful heap spray technique - a heap spray which is fully attacker controlled, variable in size, temporally indefinite and controllably freeable. These heap spray techniques are uncommon but not unheard of, and other heap spray strategies do exist but at least some of them, such as the userfaultfd technique, are mitigated by SELinux policy and sysctl parameters although others (such as the equivalent technique provided by AppFuse) may still exist. That makes this new technique particularly potent on Android devices where many of these spray strategies are mitigated. Mali provides a performance tracing facility called "timeline stream", "tlstream" or "tl". This facility was available to unprivileged code, traces all GPU operations across the whole system (including GPU operations by other processes), and uses kernel pointers as object identifiers in the messages sent to userspace. This means that by generating tlstream events referencing objects containing attacker-controlled data, the attackers are able to place 16 bytes of controlled data at a known kernel address. Additionally, the attackers can use this capability to defeat KASLR as these kernel pointers also leak information about the kernel address space back to userland. This issue was reported to ARM as CVE-2023-26083 on January 17th, 2023, and is now fixed by preventing unprivileged access to the tlstream facility. The heap spray described above is used to reclaim the backing store of the improperly freed struct snd_kcontrol used in snd_ctl_elem_write. The tlstream facility then allows attackers to fill that backing store with pointers to attacker controlled data. Blending these two capabilities allows attackers to forge highly detailed struct snd_kcontrol objects. The snd_ctl_elem_write code (along with the struct snd_kcontrol definition) is shown below: struct snd_kcontrol { struct list_head list; /* list of controls */ struct snd_ctl_elem_id id; unsigned int count; /* count of same elements */ snd_kcontrol_info_t *info; snd_kcontrol_get_t *get; snd_kcontrol_put_t *put; union { snd_kcontrol_tlv_rw_t *c; const unsigned int *p; } tlv; unsigned long private_value; void *private_data; void (*private_free)(struct snd_kcontrol *kcontrol); struct snd_kcontrol_volatile vd[]; /* volatile data */ }; ... static int snd_ctl_elem_write(struct snd_card *card, struct snd_ctl_file *file, struct snd_ctl_elem_value *control) { struct snd_kcontrol *kctl; struct snd_kcontrol_volatile *vd; unsigned int index_offset; int result; down_write(&card->controls_rwsem); kctl = snd_ctl_find_id(card, &control->id); if (kctl == NULL) { up_write(&card->controls_rwsem); return -ENOENT; } index_offset = snd_ctl_get_ioff(kctl, &control->id); vd = &kctl->vd[index_offset]; if (!(vd->access & SNDRV_CTL_ELEM_ACCESS_WRITE) || kctl->put == NULL || (file && vd->owner && vd->owner != file)) { up_write(&card->controls_rwsem); return -EPERM; } snd_ctl_build_ioff(&control->id, kctl, index_offset); result = snd_power_ref_and_wait(card); /* validate input values */ ... if (!result) result = kctl->put(kctl, control); ... //Drop the locks and return } struct user_element { ... char *elem_data; /* element data */ unsigned long elem_data_size; /* size of element data in bytes */ ... }; static int snd_ctl_elem_user_put(struct snd_kcontrol *kcontrol, struct snd_ctl_elem_value *ucontrol) { int change; struct user_element *ue = kcontrol->private_data; unsigned int size = ue->elem_data_size; char *dst = ue->elem_data + snd_ctl_get_ioff(kcontrol, &ucontrol->id) * size; change = memcmp(&ucontrol->value, dst, size) != 0; if (change) memcpy(dst, &ucontrol->value, size); return change; } This write is unreliable because each use of the write relies heavily on races and heap sprays in order to hit. The exploit proceeds by creating a deterministic, highly reliable arbitrary read/write via the use of this original unreliable write. In the Linux kernel virtual filesystem (VFS) architecture, every struct file comes with a struct file_operations member that defines a set of function pointers used for various different system calls such as read, write, ioctl, mmap, etc. These function calls interpret the struct file’s private_data member in a type-specific way. private_data is usually a pointer to one of a variety of different data structures based on the specific struct file. Both of these members, the private_data and the fops, are populated into the struct file upon allocation/creation of the struct file which for example happens within syscalls in the open family. This fops table can be registered as part of a miscdevice which is used for certain files in the /dev filesystem. For example /dev/ashmem which is an Android-specific shared memory API: static const struct file_operations ashmem_fops = { .owner = THIS_MODULE, .open = ashmem_open, .release = ashmem_release, .read_iter = ashmem_read_iter, .llseek = ashmem_llseek, .mmap = ashmem_mmap, .unlocked_ioctl = ashmem_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = compat_ashmem_ioctl, #endif }; static struct miscdevice ashmem_misc = { .minor = MISC_DYNAMIC_MINOR, .name = "ashmem", .fops = &ashmem_fops, }; static int __init ashmem_init(void) { int ret = -ENOMEM; ... ret = misc_register(&ashmem_misc); if (unlikely(ret)) { pr_err("failed to register misc device!\n"); goto out_free2; } ... } Under normal circumstances, the intended flow is that when userland calls open on /dev/ashmem, a struct file is created, and a pointer to the ashmem_fops table is populated into the struct. While the fops table itself is read-only, the ashmem_misc data structure that contains the pointer used for populating future struct files during an open of /dev/ashmem is not. By replacing ashmem_misc.fops with a pointer to a fake file_operations struct, an attacker can control the file_operations that will be used by files created by open("/dev/ashmem") going forward. This requires forging a replacement ashmem_fops file_operations table in kernel memory so that a future arbitrary-write can write a pointer to that file_operations table into the ashmem_misc structure. While the previously explained Mali tlstream “controlled data at a known kernel address” primitive (CVE-2023-26083) provides precisely this sort of ability, the object allocated to read out via tlstream only gives 16 bytes of attacker-controlled data - not enough controlled memory to forge a complete file_operations table. Instead, the exploit uses their initial arbitrary write to construct a new fake fops table within the .data section of the kernel. The exploit writes into the init_uts_ns kernel symbol, in particular the part of the associated structure that holds the uname of the kernel. Overwriting data in this structure provides a clear indicator of when the race conditions are won and the arbitrary write succeeds (the uname syscall returns different data than before). Once their fake file_operations table is forged, they use their arbitrary write once more to place a pointer to this table inside of the ashmem_misc structure. This forged file_operations struct varies from device to device, but on the Samsung S10 it looks like so: static const struct file_operations ashmem_fops = { .open = ashmem_open, .release = ashmem_release, .read = configfs_read_file .write = configfs_write_file .llseek = default_llseek, .mmap = ashmem_mmap, .unlocked_ioctl = ashmem_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = compat_ashmem_ioctl, #endif }; Note that the VFS read/write operations have been changed to point to configfs handlers instead. The combination of configfs file ops with ashmem file ops leads to an attacker-induced type-confusion on the private_data object in the struct file. Analysis of those handlers reveal simple-to-reach copy_[to/from]_user calls with the kernel pointer populated from the struct file’s private_data backing store: static int fill_write_buffer(struct configfs_buffer * buffer, const char __user * buf, size_t count) { ... if (count >= SIMPLE_ATTR_SIZE) count = SIMPLE_ATTR_SIZE - 1; error = copy_from_user(buffer->page,buf,count); buffer->needs_read_fill = 1; buffer->page[count] = 0; return error ? -EFAULT : count; } ... static ssize_t configfs_write_file(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { struct configfs_buffer * buffer = file->private_data; ssize_t len; mutex_lock(&buffer->mutex); len = fill_write_buffer(buffer, buf, count); if (len > 0) len = flush_write_buffer(file->f_path.dentry, buffer, len); if (len > 0) *ppos += len; mutex_unlock(&buffer->mutex); return len; } The private_data backing store itself can be subsequently modified by an attacker using the ashmem ioctl command ASHMEM_SET_NAME in order to change the kernel pointer used for the arbitrary read and write primitives. The final arbitrary write primitive (for example) looks like this: int arb_write(unsigned long dst, const void *src, size_t size) { __int64 page_offset; // x8 __int128 v5; // q0 __int64 neg_idx; // x24 void *data_to_write; // x21 char tmp_name_buffer[256]; // [xsp+0h] [xbp-260h] BYREF char name_buffer[256]; // [xsp+100h] [xbp-160h] BYREF char v13; // [xsp+200h] [xbp-60h] ... memset(tmp_name_buffer, 0, sizeof(tmp_name_buffer)); page_offset = *(_QWORD *)(*(_QWORD *)(qword_898840 + 24) + 1056LL); if ( (page_offset & 0x8000000000000000LL) == 0 ) { while ( 1 ) ; } //dst is the kernel address we will write to *(_QWORD *)&tmp_name_buffer[(int)page_offset] = dst; neg_idx = 0; memset(name_buffer,'C',sizeof(name_buffer)); //They have to do this backwards while loop so they can write //nulls into the name buffer while (1) { name_buffer[neg_idx + 244] = tmp_name_buffer[neg_idx + 255]; if ( (ioctl(ashmem_fd, ASHMEM_SET_NAME, name_buffer) < 0) break; if ( --neg_idx == -245 ) { //At this point, the ->page used in configfs_write_file will //be set due to the ASHMEM_SET_NAME calls if ( (lseek(ashmem_fd, 0LL, 0) >= 0) { data_to_write = (void *)(mmap_page - size + 4096); memcpy(data_to_write, src, size); //This will EFAULT due to intentional misalignment on the //page so as to ensure copying the right number of bytes write(ashmem_fd, data_to_write, size + 1); return 0; } return -1; } } return -1; } The arbitrary read primitive is nearly identical to the arbitrary write primitive code, but instead of using write(2), they use read(2) from the ashmem_fd instead to read data from the kernel into userland. This exploit chain provides a real-world example of what we believe modern in-the-wild Android exploitation looks like. An early contextual theme from the initial stages of this exploit chain (not described in detail in this post) is the reliance on n-days to bypass the hardest security boundaries. Reliance on patch backporting by downstream vendors leads to a fragile ecosystem where a single missed bugfix can lead to high-impact vulnerabilities on end-user devices. The retention of these vulnerabilities in downstream codebases counteracts the efforts that the security research and broader development community invest in discovering bugs and developing patches for widely used software. It would greatly improve the security of those end-users if vendors strongly considered efficient methods that result in faster and more reliable patch-propagation to downstream devices. It is also particularly noteworthy that this attacker created an exploit chain using multiple bugs from kernel GPU drivers. These third-party Android drivers have varying degrees of code quality and regularity of maintenance, and this represents a notable opportunity for attackers. We also see the risk that the Linux kernel 32-bit compatibility layer presents, particularly when it requires the same patch to be re-implemented in multiple places. This requirement makes patching even more complex and error-prone, so vendors and kernel code-writers must continue to remain vigilant to ensure that the 32-bit compatibility layer presents as few security issues as possible in the future. Update October 6th 2023: This post was edited to reflect the fact that CVE-2022-4262 was a Chrome 0-day at the time this chain was discovered, not an n-day as previously stated.Introduction
Bug #1: Compatibility Layers Have Bugs Too (CVE-2023-0266)
These missing locks allowed the attacker to race snd_ctl_elem_add and snd_ctl_elem_write_user_compat calls resulting in snd_ctl_elem_write executing with a freed struct snd_kcontrol object. The 32-bit SNDRV_CTL_IOCTL_ELEM_READ ioctl behaved very similarly to the SNDRV_CTL_IOCTL_ELEM_WRITE ioctl, with the same bug leading to a similar primitive.A New Heap Spray Primitive
Bug #2: A Leaky Feature (CVE-2023-26083)
Combining The Primitives
While there are a couple different options available in this function to take advantage of an attacker-controlled kctl, the most apparent is the call to kctl->put. Since the attacker has arbitrary control of the kctl->put function pointer, they could have used this call right away to gain control over the program counter. However in this case they chose to store the developer-intended snd_ctl_elem_user_put function pointer in the kctl->put member and use the call to that function to gain arbitrary r/w instead. By default, snd_ctl_elem_user_put is the put function for snd_kcontrol structs. snd_ctl_elem_user_put does the following operations:
The attacker uses their (attacker-controlled data at known address) primitive provided by the tlstream facility to generate an allocation acting as a user_element struct. The pointer to this struct is later fed into the kctl struct as the private_data field, giving the attacker control over the struct user_element used in this function. The destination for the memcpy call comes directly out of this user_element struct, meaning that the attacker can set the destination to an arbitrary kernel address. Since the ucontrol->value used as the source comes directly from userland by design, this leads directly to an arbitrary write of controlled data to kernel virtual memory.A Recap of the Linux Kernel VFS Subsystem
Stabilizing The Arbitrary Write
Conclusion