CVE-2014-3153是一个相当经典的提权漏洞,影响范围相当广泛,这实际上是一个Linux内核的uaf漏洞。神奇小子Geohot也利用了这个漏洞,开发出TowelRoot(简单粗暴的安卓root工具)。膜拜发现此漏洞以及写出exp的大牛。
Futex(Fast Userspace muTexes),按英文翻译过来就是快速用户空间互斥体,设计目的是加速glibc层的互斥访问速度,在不必要的情况下,Futex可以在用户空间就处理互斥访问(仍然需要进入内核,因为futex函数是系统调用,但开销相对内核互斥量非常小,就是简单的判断一下uaddr的值),而不进入内核互斥量,大大的减小了内核的开销。简单的说,futex就是通过在用户态的检查,(motivation)如果了解到没有竞争就不用陷入内核了,大大提高了low-contention时候的效率。 Linux从2.5.7开始支持Futex。
relock存在于futex_lock_pi()
下面让我们看下futex的流程
在futex系统调用内部是通过do_futex()完成具体操作
linux/kernel/futex.c long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout, u32 __user *uaddr2, u32 val2, u32 val3) { int cmd = op & FUTEX_CMD_MASK; unsigned int flags = 0; if (!(op & FUTEX_PRIVATE_FLAG)) flags |= FLAGS_SHARED; if (op & FUTEX_CLOCK_REALTIME) { flags |= FLAGS_CLOCKRT; if (cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI) return -ENOSYS; } switch (cmd) { case FUTEX_LOCK_PI: case FUTEX_UNLOCK_PI: case FUTEX_TRYLOCK_PI: case FUTEX_WAIT_REQUEUE_PI: case FUTEX_CMP_REQUEUE_PI: if (!futex_cmpxchg_enabled) return -ENOSYS; } switch (cmd) { case FUTEX_WAIT: val3 = FUTEX_BITSET_MATCH_ANY; case FUTEX_WAIT_BITSET: return futex_wait(uaddr, flags, val, timeout, val3); case FUTEX_WAKE: val3 = FUTEX_BITSET_MATCH_ANY; case FUTEX_WAKE_BITSET: return futex_wake(uaddr, flags, val, val3); case FUTEX_REQUEUE: return futex_requeue(uaddr, flags, uaddr2, val, val2, NULL, 0); case FUTEX_CMP_REQUEUE: return futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 0); case FUTEX_WAKE_OP: return futex_wake_op(uaddr, flags, uaddr2, val, val2, val3); case FUTEX_LOCK_PI: return futex_lock_pi(uaddr, flags, val, timeout, 0); case FUTEX_UNLOCK_PI: return futex_unlock_pi(uaddr, flags); case FUTEX_TRYLOCK_PI: return futex_lock_pi(uaddr, flags, 0, timeout, 1); case FUTEX_WAIT_REQUEUE_PI: val3 = FUTEX_BITSET_MATCH_ANY; return futex_wait_requeue_pi(uaddr, flags, val, timeout, val3, uaddr2); case FUTEX_CMP_REQUEUE_PI: return futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 1); } return -ENOSYS; }
在do_futex(……)中,主要根据op代表的具体操作类型进行不同分支的操作。例如FUTEX_WAIT执行futex_wait(uaddr, flags, val, timeout, val3),FUTEX_WAKE则执行futex_wake(uaddr, flags, val, val3),这是最基本futex阻塞唤醒操作。
我们来看一下futex_lock_pi这个函数
/*
* Userspace tried a 0 -> TID atomic transition of the futex value
* and failed. The kernel side here does the whole locking operation:
* if there are waiters then it will block, it does PI, etc. (Due to
* races the kernel might see a 0 value of the futex too.)
*/
static int futex_lock_pi(u32 __user *uaddr, unsigned int flags, int detect,
ktime_t *time, int trylock)
{
struct hrtimer_sleeper timeout, *to = NULL;
struct futex_hash_bucket *hb;
struct futex_q q = futex_q_init;
int res, ret;
...
...
ret = futex_lock_pi_atomic(uaddr, hb, &q.key, &q.pi_state, current, 0);
if (unlikely(ret)) {
switch (ret) {
case 1:
/* We got the lock. */
ret = 0;
goto out_unlock_put_key;
case -EFAULT:
goto uaddr_faulted;
case -EAGAIN:
/*
* Task is exiting and we just wait for the
* exit to complete.
*/
queue_unlock(&q, hb);
put_futex_key(&q.key);
cond_resched();
goto retry;
default:
goto out_unlock_put_key;
}
}
可以看出futex_lock_pi的核心函数是futex_lock_pi_atomic
/** * futex_lock_pi_atomic() - Atomic work required to acquire a pi aware futex * @uaddr: the pi futex user address * @hb: the pi futex hash bucket * @key: the futex key associated with uaddr and hb * @ps: the pi_state pointer where we store the result of the * lookup * @task: the task to perform the atomic lock work for. This will * be "current" except in the case of requeue pi. * @set_waiters: force setting the FUTEX_WAITERS bit (1) or not (0) * * Returns: * 0 - ready to wait * 1 - acquired the lock * <0 - error * * The hb->lock and futex_key refs shall be held by the caller. */ static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb, union futex_key *key, struct futex_pi_state **ps, struct task_struct *task, int set_waiters) { ... ... /* * To avoid races, we attempt to take the lock here again * (by doing a 0 -> TID atomic cmpxchg), while holding all * the locks. It will most likely not succeed. */ newval = vpid; if (set_waiters) newval |= FUTEX_WAITERS; if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, 0, newval))) return -EFAULT; ... ... /* * Surprise - we got the lock. Just return to userspace: */ if (unlikely(!curval)) return 1; ... ... }
从注释我们可以看出unlikely(cmpxchg_futex_value_locked(&curval, uaddr, 0, newval))用于比较uaddr是否为0,若为0则将线程id赋给uaddr
然后下面unlikely(!curval)如果cmpxchg操作成功即代表可以获取锁
但这时候会产生一个问题,uaddr是位于用户空间的一个整形变量,我们可以手动设为0,这样uaddr被锁定也可以再次获取锁。因为我们没有调用futex_unlock_pi释放锁就进行了再次上锁,所以其中有一些收尾工作没有做,比如唤醒阻塞在锁上的线程,修改 pi_state等。这个问题称为relock,也可以叫多重上锁。
根据exp可知,这个漏洞利用主要依靠于系统调用接口FUTEX_WAIT_REQUEUE_PI、FUTEX_CMP_REQUEUE_PI
根据do_futex的源码可知,FUTEX_WAIT_REQUEUE_PI调用了函数futex_wait_requeue_pi,FUTEX_CMP_REQUEUE_PI调用了函数futex_requeue
futex_wait_requeue_pi的主要作用是在uaddr上等待唤醒,通过调用futex_wait_queue_me函数等待自身被唤醒。唤醒过程将所有阻塞在 uaddr1上的线程全部移动到uaddr2上去,以防止“惊群”的情况发生。
futex_requeue的主要作用是唤醒uaddr1最高优先级的线程,然后将阻塞在uaddr1上的等待线程转移到uaddr2上
从源码可以看出,futex_requeue唤醒futex_wait_requeue_pi线程通过两个函数:futex_proxy_trylock_atomic和rt_mutx_start_proxy_lock
如果futex_proxy_trylock_atomic函数获取uaddr2锁成功,它会返回用户空间,唤醒uaddr1上被阻塞的最高优先级进程,若获取uaddr2锁失败,继续执行后面代码。函数不进入内核互斥量,从而减小内核互斥量的开销。
如果rt_mutx_start_proxy_lock函数获取uaddr2锁成功,它会调用requeue_pi_wake_futex函数唤醒等待的线程,在该函数中将互斥锁的rt_waiter清空。如果失败,则将线程阻塞到uaddr2的内核互斥量上,将rt_waiter加入rt_mutex的waiter list。
我用伪代码描述以下调用
futex_wait_requeue_pi(uaddr1, uaddr2) // 在uaddr1上等待 futex_requeue(uaddr1, uaddr2) // 返回1,表示成功 futex_requeue(uaddr2, uaddr2) //返回0xffffffea,表示失败,crash
前两步是正常操作,首先调用futex_wait_requeue_pi在uaddr1上等待,等待futex_requeue的唤醒,然后futex_requeue尝试获取uaddr2上的锁,然后唤醒uaddr1上等待的线程
但最后一步执行了futex_requeue(uaddr2, uaddr2),显然是不合逻辑的,如果第二次唤醒动作执行的是futex_requeue(uaddr1, uaddr2),那么futex_requeue会返回0,表示未唤醒成功,不会产生crash,但是Futex没有检查这样的调用,也就是说没有检查uaddr1 == uaddr2的情况,从而造成了我们可以二次进入futex_requeue中进行唤醒操作。
线程1调用futex_lock_pi锁住B,此时没有其他竞争,所以成功锁住B,B中内容被设置为tid
创建线程2,进入系统调用futex_wait_requeue_pi后,会在栈上初始化一个futex_q结构体和rt_mutex_waiter。然后调用futex_wait_queue_me在A上进行等待,此时futex_q会被加入到A对应的PI chain中
线程2进入内核等待后,线程1进入内核调用futex_requeue唤醒线程2,首先会走到futex_proxy_trylock_atomic,由于B被锁住,所以获取锁失败,接下来走到rt_mutex_start_proxy_lock函数,同样获取锁失败,线程2阻塞到B的rt_mutex上,同时将futex_wait_requeue_pi中的rt_waiter加入到rt_waiter的waiter list上,调用链为rt_mutx_start_proxy_lock -> task_blocks_on_rt_mutex -> plist_add
利用relock漏洞,在用户态解锁B
利用requeue漏洞,再次调用futex_requeue(B, B),这会导致 futex_proxy_trylock_atomic函数被再次调用,进而调用futex_lock_pi_atomic。futex_lock_pi_atomic判断B值为0,从而获得锁,然后调用requeue_pi_wake_futex唤醒线程2
q->rt_waiter会被置为NULL,因为已经获取了锁
futex_wait_requeue_pi会认为没有进入内核互斥量等待,也就是说rt_waiter没有被加入到rt_mutex的waiter list上,因此futex_wait_requeue_pi将执行不清理rt_waiter的分支代码,从而造成了线程2被唤醒,但是它的rt_waiter没有从rt_mutex上摘除,而这个rt_waiter还正好在栈上,等futex_waite_requeue_pi线程结束后,会回收等待链表,就会引用到未被清理的re_waiter,从而导致uaf
漏洞原理调用图:
http://kouucocu.lofter.com/post/1cdb8c4b_50f62fe
https://elixir.bootlin.com/linux/v3.4/source/kernel/futex.c#L1967
http://blog.topsec.com.cn/cve2014-3153/
《漏洞战争》CVE-2014-3153Android内核Futex提取漏洞