MTE As Implemented, Part 2: Mitigation Case Studies

By Mark Brand, Project Zero

Background

In 2018, in the v8.5a version of the ARM architecture, ARM proposed a hardware implementation of tagged memory, referred to as MTE (Memory Tagging Extensions).

In Part 1 we discussed testing the technical (and implementation) limitations of MTE on the hardware that we've had access to. This post will now consider the implications of what we know on the effectiveness of MTE-based mitigations in several important products/contexts.

To summarize - there are two key classes of bypass techniques for memory-tagging based mitigations, and these are the following (for some examples, see Part 1):

Known-tag-bypasses - In general, confidentiality of tag values is key to the effectiveness of memory-tagging as a mitigation. A breach of tag confidentiality allows the attacker to directly or indirectly ensure that their invalid memory accesses will be correctly tagged, and are therefore not detectable.
Unknown-tag-bypasses - Implementation limits might mean that there are opportunities for an attacker to still exploit a vulnerability despite performing memory accesses with incorrect tags that could be detected.

There are two main modes for MTE enforcement:

Synchronous (sync-MTE) - tag check failures result in a hardware fault on instruction retirement. This means that the results of invalid reads and the effects of invalid writes should not be architecturally observable.
Asynchronous (async-MTE) - tag check failures do not directly result in a fault. The results of invalid reads and the effects of invalid writes are architecturally observable, and the failure is delivered at some point after the faulting instruction in the form of a per-cpu flag.

Since Spectre, it has been clear that using standard memory-tagging approaches as a "hard probabilistic mitigation"1 is not generally possible. In any context where an attacker can construct a speculative side-channel, known-tag-bypasses are a fundamental weakness that must be accounted for.

Another proposed approach is combining MTE with another software approach to construct a "hard deterministic mitigation"2. The primary example of this would be the *Scan+MTE combinations proposed in Chrome to mitigate use-after-free vulnerabilities by ensuring that a tag is not re-used for an allocation while there are any stale pointers pointing to that allocation.

In this case, is async-MTE sufficient as an effective mitigation? We have demonstrated techniques that allow an unknown-tag-bypass when using async-MTE, so it seems clear that for a "hard" mitigation, (at least) sync-MTE will be necessary. This should not, however, be interpreted as implying that such a "soft" mitigation would not prove a significant inconvenience to attackers - we're going to discuss that in detail below.

How much would MTE hurt attackers?

In order to understand the "additional difficulty" that attackers will face in writing exploits that can bypass MTE based mitigations, we need to consider carefully the context in which the attacker finds themself.

We have made some assumptions here about the target reliability that a high-tier attacker would want as around 95% - this is likely lower than currently expected in most cases, but probably higher than the absolute limit at which an exploit might become too unreliable for their operational needs. We also note that in some contexts an attacker might be able to use even an extremely unreliable exploit without significantly increasing their risk of detection. While we'd expect attackers to desire (and invest in) achieving reliability, it's unlikely that even if we could force an upper bound to that reliability this would be enough to completely prevent exploitation.

However, any such drop in reliability should generally be expected to increase detection of in-the-wild usage of exploits, increasing the risk to attackers accordingly.

An additional note is that most unknown-tag-bypasses would be prevented by the use of sync-MTE, at least in the absence of specific weaknesses in the application which would over time likely be fixed as exploits are observed exploiting those weaknesses.

We consider 4 main contexts here, as we believe these are the most relevant/likely use-cases for a usermode mitigation:

Context	Mode	Bypass techniques
		known-tag-bypass	unknown-tag-bypass
Chrome: Renderer Exploit	async	Trivial ♻️	Likely trivial ♻️
	sync	Trivial ♻️	Bypass techniques should be rare 🛠️
Chrome: IPC Sandbox Escape	async	Likely possible in many cases ♻️	Likely possible in many cases 🐛*
	sync	Likely possible in many cases ♻️	Bypass techniques should be rare 🛠️
Android: Binder Sandbox Escape	async	Difficulty will depend on service	Difficulty will depend on service 🐛*
	sync	Difficulty will depend on service	Bypass techniques should be rare 🛠️
Android: Messaging App Oneshot	async	Likely impossible in most cases	Good enough bugs will be very rare 🐛*
	sync	Likely impossible in most cases	Bypass techniques should be rare 🛠️

The degree of pain for attackers caused by needing to bypass MTE is roughly assessed from low to very high.

♻️: Once developed, generic bypass technique can likely be shared between exploits.

🛠️: Limited supply of bypass techniques that could be fixed, eventually eliminating this bypass.

🐛: Additional constraints imposed by bypass techniques mean that the subset of issues that are exploitable is significantly reduced with MTE.

* Note that it's also potentially possible to design software to make exploitation of these unknown-tag-bypass techniques more restrictive by eg. inserting system calls in specific choke-points. We haven't investigated the practicality or limitations of such an approach at this time, but it is unlikely to be generally applicable especially where third-party code is involved.

Chrome: Javascript -> Compromised Renderer

Spectre-type speculative side-channels can be used to break tag confidentiality, so known-tag-bypasses are a certainty.

For unknown-tag-bypasses, javascript should in most situations be able to avoid system calls for the duration of their exploit, so the exploit needs to complete within the soft time constraint.

It's likely that both known-tag-bypass and unknown-tag-bypass techniques can be made generic and reused across multiple different vulnerabilities.

Chrome: Compromised Renderer -> Mojo IPC Sandbox Escape

It is likely that Spectre-type speculative side-channels can be used to break tag confidentiality, so known-tag-bypasses are a possibility.

For unknown-tag-bypasses, we believe that there are circumstances under which multiple IPC messages can be processed without a system call in between. This does not hold for all situations, and the current public state-of-the-art techniques for Browser process IPC exploitation would perform multiple system calls and would not be suitable.

It's possible that a generic speculative-side-channel attack could be developed that would allow reuse of techniques for known-tag-bypasses against a specific privileged process (likely the Browser process). Any unknown-tag-bypass exploit would likely require additional per-bug cost to develop due to the additional complexity required in avoiding system calls for the duration of the exploit.

Android: App -> Binder IPC Sandbox Escape

It is likely that Spectre-type speculative side-channels can be used to break tag confidentiality, so known-tag-bypasses are a possibility.

For unknown-tag-bypasses we note that there is a mandatory system call ioctl(..., BINDER_WRITE_READ, …) between different IPC messages. This implies that an exploit would need to complete within the execution of the triggering IPC message in addition to meeting the soft time constraint.

As there are a large number of different Binder IPC services hosted in different processes, a known-tag-bypass technique is less likely to be reusable between different vulnerabilities. The exploitation of unknown-tag-bypasses is unlikely to be reusable, and will require specific per-vulnerability development.

An additional note - at present, .data section pointers are not tagged, so the zygote architecture means that pointer-forging between some contexts would be easy for an attacker, which could be a useful technique for exploiting some vulnerabilities leading to second-order memory corruption (eg. type confusion allowing an attacker to treat data as a pointer).

Android: Remote -> Messaging App

It is unlikely that Spectre-type speculative side-channels can be used to break tag confidentiality, so known-tag-bypasses are highly unlikely, unless the application allows alternative side channels (or eg. repeated exploitation attempts until the tag is correct).

For unknown-tag-bypasses, attackers will need a very good one-shot bug, likely in complex file-format parsing. An exploit would still need to meet the soft time constraint.

It's likely that exploitation techniques here will be bespoke and both per-application and per-vulnerability, even if there is significant shared code between different messaging applications.

Part 3 continues this series with a more detailed discussion of the specifics of applying MTE to the kernel, which has some additional nuance (and may still contain some information of interest even if you're only interested in user-space applications).

[1] A hard mitigation that does not provide deterministic protection, but which can be universally bypassed by the attacker "winning" a probabilistic condition, in the case of MTE (with 4 tag bits available, likely with one value reserved) this would probably imply a 1/15 chance of success.