03/05/2024
40 min read
Today, nearly two percent of all TLS 1.3 connections established with Cloudflare are secured with post-quantum cryptography. We expect to see double-digit adoption by the end of 2024. Apple announced in February 2024 that it will secure iMessage with post-quantum cryptography before the end of the year, and Signal chats are already secured. What once was the topic of futuristic tech demos will soon be the new security baseline for the Internet.
A lot has been happening in the field over the last few years, from mundane name changes (ML-KEM is the new name for Kyber), to new proposed algorithms in the signatures onramp, to the catastrophic attack on SIKE. Plenty that has been written merely three years ago now feels quite out of date. Thus, it is high time for an update: in this blog post we’ll take measure of where we are now in early 2024, what to expect for the coming years, and what you can do today.
The quantum threat
First things first: why are we migrating our cryptography? It’s because of quantum computers. These marvelous devices, instead of restricting themselves to zeroes and ones, compute using more of what nature actually affords us: quantum superposition, interference, and entanglement. This allows quantum computers to excel at certain very specific computations, notably simulating nature itself, which will be very helpful in developing new materials.
Quantum computers are not going to replace regular computers, though: they’re actually much worse than regular computers at most tasks. Think of them as graphic cards — specialized devices for specific computations.
Unfortunately, quantum computers also excel at breaking key cryptography that’s in common use today. Thus, we will have to move to post-quantum cryptography: cryptography designed to be resistant against quantum attack. We’ll discuss the exact impact on the different types of cryptography later on. For now quantum computers are rather anemic: they’re simply not good enough today to crack any real-world cryptographic keys.
That doesn’t mean we shouldn’t worry yet: encrypted traffic can be harvested today, and decrypted with a quantum computer in the future.
Quantum numerology
When will they be good enough? Like clockwork, every year there are news stories of new quantum computers with record-breaking number of qubits. This focus on counting qubits is quite misleading. To start, quantum computers are analogue machines, and there is always some noise interfering with the computation.
There are big differences between the different types of technology used to build quantum computers: silicon-based quantum computers seem to scale well, are quick to execute instructions, but have very noisy qubits. This does not mean they’re useless: with quantum error correcting codes one can effectively turn tens of millions of noisy silicon qubits into a few thousand high-fidelity ones, which could be enough to break RSA. Trapped-ion quantum computers, on the other hand, have much less noise, but have been harder to scale. Only a few hundred-thousand trapped-ion qubits could potentially draw the curtain on RSA.
We’re only scratching the surface with the number of qubits and noise. For instance, a quirk of many quantum computers is that only adjacent qubits can interact — something that most estimates do not take into account. On the other hand, for a specific quantum computer, a tailored algorithm can perform much better than a generic one. We can only guess what a future quantum computer will look like, and today’s estimates are most likely off by at least an order of magnitude.
When will quantum computers break real-world cryptography?
So, when do we expect the demise of RSA-2048 which is in common use today? In a 2022 survey, over half the interviewed experts thought it’d be more probable than not that by 2037 such a cryptographically relevant quantum computer would’ve been built.
We can also look at the US government’s timeline for the migration to post-quantum cryptography. The National Security Agency (NSA) aims to finish its migration before 2033, and will start to prefer post-quantum ready vendors for many products in 2025. The US government has a similarly ambitious timeline for the country as a whole: the aim is to be done by 2035.
More anecdotally, at industry conferences on the post-quantum migration, I see particularly high participation of the automotive branch. Not that surprising, considering that the median age of a car on the road is 14 years, a lot of money is on the line, and not all cryptography used in cars can be upgraded easily once on the road.
So when will it arrive? Whether it’s 2034 or 2050, it will be too soon. The immense success of cryptography means it’s all around us now, from dishwasher, to pacemaker, to satellite. Most upgrades will be easy, and fit naturally in the product’s lifecycle, but there will be a long tail of difficult and costly upgrades.
Two migrations
To help prioritize, it is important to understand that there is a big difference in the difficulty, impact, and urgency of the post-quantum migration for the different kinds of cryptography required to create secure connections. In fact, for most organizations there will be two post-quantum migrations: key agreement and signatures / certificates.
Already post-quantum secure: symmetric cryptography
Let’s explain this for the case of creating a secure connection when visiting a website in a browser. The workhorse is a symmetric cipher such as AES-GCM. It’s what you would think of when thinking of cryptography: both parties, in this case the browser and server, have a shared key, and they encrypt / decrypt their messages with the same key. Unless you have that key, you can’t read anything, or modify anything.
The good news is that symmetric ciphers, such as AES-GCM, are already post-quantum secure. There is a common misconception that Grover’s quantum algorithm requires us to double the length of symmetric keys. On closer inspection of the algorithm, it’s clear that it is not practical. The way NIST, the US National Institute for Standards and Technology (who have been spearheading the standardization of post-quantum cryptography) defines their post-quantum security levels is very telling. They define a specific security level by saying the scheme should be as hard to crack using either a classical or quantum computer as an existing symmetric cipher as follows:
Level | Definition, as least as hard to break as … | Example |
---|---|---|
1 | To recover the key of AES-128 by exhaustive search | ML-KEM-512, SLH-DSA-128s |
2 | To find a collision in SHA256 by exhaustive search | ML-DSA-44 |
3 | To recover the key of AES-192 by exhaustive search | ML-KEM-768 |
4 | To find a collision in SHA384 by exhaustive search | |
5 | To recover the key of AES-256 by exhaustive search | ML-KEM-1024, SLH-DSA-256s |
There are good intentions behind suggesting doubling the key lengths of symmetric cryptography. In many use cases, the extra cost is not that high, and it mitigates any theoretical risk completely. Scaling symmetric cryptography is cheap: double the bits is typically far less than half the cost. So on the surface, it is simple advice.
But if we insist on AES-256, it seems only logical to insist on NIST PQC level 5 for the public key cryptography as well. The problem is that public key cryptography does not scale very well. Depending on the scheme, going from level 1 to level 5 typically more than doubles data usage and CPU cost. As we’ll see, deploying post-quantum signatures at level 1 is already painful, and deploying them at level 5 is problematic.
A second reason is that upgrading symmetric cryptography isn’t always easy. If it requires replacing hardware, it can be costly indeed. An organization that cannot migrate all its cryptography in time simply can’t afford to waste its time doubling symmetric key lengths.
First migration: key agreement
Symmetric ciphers are not enough on their own: how do I know which key to use when visiting a website for the first time? The browser can’t just send a random key, as everyone listening in would see that key as well. You’d think it’s impossible, but there is some clever math to solve this, so that the browser and server can agree on a shared key. Such a scheme is called a key agreement mechanism, and is performed in the TLS handshake. Today almost all traffic is secured with X25519, a Diffie–Hellman-style key agreement, but its security is completely broken by Shor’s algorithm on a quantum computer. Thus, any communication secured today with Diffie–Hellman, when stored, can be decrypted in the future by a quantum computer.
This makes it urgent to upgrade key agreement today. As we will see, luckily, post-quantum key agreement is relatively straight-forward to deploy.
Second migration: signatures / certificates
The key agreement allows secure agreement on a key, but there is a big gap: we do not know with whom we agreed on the key. If we only do key agreement, an attacker in the middle can do separate key agreements with the browser and server, and re-encrypt any exchanged messages. To prevent this we need one final ingredient: authentication.
This is achieved using signatures. When visiting a website, say cloudflare.com, the web server presents a certificate signed by a certification authority (CA) that vouches that the public key in that certificate is controlled by cloudflare.com. In turn, the web server signs the handshake and shared key using the private key corresponding to the public key in the certificate. This allows the client to be sure that they’ve done a key agreement with cloudflare.com.
RSA and ECDSA are commonly used traditional signature schemes. Again, Shor’s algorithm makes short work of them, allowing a quantum attacker to forge any signature. That means that a MitM (man-in-the-middle) can break into any connection that uses a signature scheme that is not post-quantum secure. This is of course an active attack: if the attacker isn’t in the middle as the handshake happens, the connection is not affected.
This makes upgrading signature schemes for TLS on the face of it less urgent, as we only need to have everyone migrated by the time the cryptographically-relevant quantum computer arrives. Unfortunately, we will see that migration to post-quantum signatures is much more difficult, and will require more time.
Timeline
Before we dive into the technical challenges of migrating the Internet to post-quantum cryptography, let’s have a look at how we got here, and what to expect in the coming years. Let’s start with how post-quantum cryptography came to be.
Origin of post-quantum cryptography
Physicists Feynman and Manin independently proposed quantum computers around 1980. It took another 14 years before Shor published his algorithm attacking public key cryptography. Most post-quantum cryptography predates Shor’s famous algorithm.
There are various branches of post-quantum cryptography, of which the most prominent are lattice-based, hash-based, multivariate, code-based, and isogeny-based. Except for isogeny-based cryptography, none of these were initially conceived as post-quantum cryptography. In fact, early code-based and hash-based schemes are contemporaries of RSA, being proposed in the 1970s, and comfortably predate the publication of Shor’s algorithm in 1994. Also, the first multivariate scheme from 1988 is comfortably older than Shor’s algorithm. It is a nice coincidence that the most successful branch, lattice-based cryptography, is Shor’s closest contemporary, being proposed in 1996. For comparison, elliptic curve cryptography, which is widely used today, was first proposed in 1985.
In the years after the publication of Shor’s algorithm, cryptographers took measure of the existing cryptography: what’s clearly broken, and what could be post-quantum secure? In 2006, the first annual International Workshop on Post-Quantum Cryptography took place. From that conference, an introductory text was prepared, which holds up rather well as an introduction to the field. A notable caveat is the demise of the Rainbow signature scheme. In that same year, the elliptic-curve key-agreement X25519 was proposed, which now secures the vast majority of all Internet connections.
NIST PQC competition
Ten years later, in 2016, NIST, the US National Institute of Standards and Technology, launched a public competition to standardize post-quantum cryptography. They’re using a similar open format as was used to standardize AES in 2001, and SHA3 in 2012. Anyone can participate by submitting schemes and evaluating the proposals. Cryptographers from all over the world submitted algorithms. To focus attention, the list of submissions were whittled down over three rounds. From the original 82, based on public feedback, eight made it into the final round. From those eight, in 2022, NIST chose to pick four to standardize first: one KEM (for key agreement) and three signature schemes.
Old name | New name | Branch |
---|---|---|
Kyber | ML-KEM (FIPS 203) Module-lattice based Key-Encapsulation Mechanism Standard |
Lattice-based |
Dilithium | ML-DSA (FIPS 204) Module-lattice based Digital Signature Standard |
Lattice-based |
SPHINCS+ | SLH-DSA (FIPS 205) Stateless Hash-Based Digital Signature Standard |
Hash-based |
Falcon | FN-DSA FFT over NTRU lattices Digital Signature Standard |
Lattice-based |
ML-KEM is the only post-quantum key agreement close to standardization now, and despite some occasional difficulty with its larger key sizes, in many cases it allows for a drop-in upgrade.
The situation is rather different with the signatures: it’s quite telling that NIST chose to standardize three already. And there are even more signatures set to be standardized in the future. The reason is that none of the proposed signatures are close to ideal. In short, they all have much larger keys and signatures than we’re used to. From a security standpoint SLH-DSA is the most conservative choice, but also the worst performer. For public key and signature sizes, FN-DSA is the best of the worst, but is difficult to implement safely because of floating-point arithmetic. This leaves ML-DSA as the default pick. More in depth comparisons are included below.
Name changes
Undoubtedly Kyber is the most familiar name, as it’s a preliminary version of Kyber that has already been deployed by Chrome and Cloudflare among others to counter store-now/decrypt-later. We will have to adjust, though. Just like Rijndael is most well-known as AES, and Keccak is SHA3 to most, ML-KEM is set to become the catchy new moniker for Kyber going forward.
Final standards
Although we know NIST will standardize these four, we’re not quite there yet. In August 2023, NIST released three draft standards for the first three with minor changes, and solicited public feedback. FN-DSA is delayed for now, as it’s more difficult to standardize and deploy securely.
For timely adopters, it’s important to be aware that based on the feedback on the first three drafts, there might be a few small tweaks before the final standards are released. These changes will be minor, but the final versions could well be incompatible on the wire with the current draft standards. These changes are mostly immaterial, only requiring a small update, and do not meaningfully affect the brunt of work required for the migration, including organizational engagement, inventory, and testing. Before shipping, there can be good reasons to wait for the final standards: support for preliminary versions is not widespread, and it might be costly to support both the draft and final standards. Still, many organizations have not started work on the post-quantum migration at all, citing the lack of standards — a situation that has been called crypto procrastination.
So, when can we expect the final standards? There is no set timeline, but we expect the first three standards to be out around mid-2024.
Predicting protocol and software support
Having NIST’s final standards is not enough. The next step is to standardize the way the new algorithms are used in higher level protocols. In many cases, such as key agreement in TLS, this is as simple as assigning an identifier to the new algorithms. In other cases, such as DNSSEC, it requires a bit more thought. Many working groups at the IETF have been preparing for years for the arrival of NIST’s final standards, and I expect that many protocol integrations will be available before the end of 2024. For the moment, let’s focus on TLS.
The next step is software support. Not all ecosystems can move at the same speed, but we have seen a lot of preparation already. We expect several major open ecosystems to have post-quantum cryptography and TLS support available early 2025, if not earlier.
Again, for TLS there is a big difference again between key agreement and signatures. For key agreement, the server and client can add and enable support for post-quantum key agreement independently. Once enabled on both sides, TLS negotiation will use post-quantum key agreement. We go into detail on TLS negotiation in this blog post. If your product just uses TLS, your store-now/decrypt-now problem could be solved by a simple software update of the TLS library.
Post-quantum TLS certificates are more of a hassle. Unless you control both ends, you’ll need to install two certificates: one post-quantum certificate for the new clients, and a traditional one for the old clients. If you aren’t using automated issuance of certificates yet, this might be a good reason to check that out. TLS allows the client to signal which signature schemes it supports so that the server can choose to serve a post-quantum certificate only to those clients that support it. Unfortunately, although almost all TLS libraries support setting up multiple certificates, not all servers expose that configuration. If they do, it will still require a configuration change in most cases. (Although undoubtedly caddy will do it for you.)
Talking about post-quantum certificates: it will take some time before Certification Authorities (CAs) can issue them. Their HSMs will first need (hardware) support, which then will need to be audited. Also, the CA/Browser forum needs to approve the use of the new algorithms. Of these, the audits are likely to be the bottleneck, as there will be a lot of submissions after the publication of the NIST standards. It’s unlikely we will see a post-quantum certificate issued by a CA before 2026.
This means that it is not unlikely that come 2026, we are in an interesting in-between time, where almost all Internet traffic is protected by post-quantum key agreement, but not a single public post-quantum certificate is used.
More post-quantum standards
NIST is not quite done standardizing post-quantum cryptography. There are two more post-quantum competitions running: round 4 and the signatures onramp.
Round 4
From the post-quantum competition, NIST is still considering standardizing one or more of the code-based key agreements BIKE, HQC, Classic McEliece in a fourth round. The performance of BIKE and HQC, both in key sizes and computational efficiency, is much worse than ML-KEM. NIST is considering standardizing one as a backup KEM, in case there is a cryptanalytic breakthrough against lattice-based cryptography, such as ML-KEM.
Classic McEliece does not compete with ML-KEM directly as a general purpose KEM. Instead, it’s a specialist: Classic McEliece public keys are very large (268kB), but it has (for a post-quantum KEM) very small ciphertexts (128 bytes). This makes Classic McEliece very attractive for use cases where the public key can be distributed in advance, such as to secure a software update mechanism.
Signatures onramp
In late 2022, after announcing the first four picks, NIST also called a new competition, dubbed the signatures onramp, to find additional signature schemes. The competition has two goals. The first is hedging against cryptanalytic breakthroughs against lattice-based cryptography. NIST would like to standardize a signature that performs better than SLH-DSA, but is not based on lattices. Secondly, they’re looking for a signature scheme that might do well in use cases where the current roster doesn’t do well: we will discuss those at length later on in this post.
In July 2023, NIST posted the 40 submissions they received for a first round of public review. The cryptographic community got to work, and as is quite normal for a first round, at the time of writing (February 2024) have managed to break 10 submissions completely, and weaken a couple of others drastically. Thom Wiggers maintains a useful website comparing the submissions.
There are some very promising submissions. We will touch briefly upon them later on. It is worth mentioning that just like the main post-quantum competition, the selection process will take many years. It is unlikely that any of these onramp signature schemes will be standardized before 2027 — if they’re not broken in the first place.
Before we dive into the nitty-gritty of migrating the Internet to post-quantum cryptography, it’s instructive to look back at some past migrations.
Looking back: migrating to TLS 1.3
One of the big recent migrations on the Internet was the switch from TLS 1.2 to TLS 1.3. Work on the new protocol started around 2014. The goal was ambitious: to start anew, cut a lot of cruft, and have a performant clean transport protocol of the future. After a few years of hard work, the protocol was ready for field tests. In good spirits, in September 2016, we announced that we support TLS 1.3.
The followup blog in December 2017 had a rather different tone: “Why TLS 1.3 isn’t in browsers yet”.
It turned out that revision 11 of TLS 1.3 was completely undeployable in practice, breaking a few percent of all users. The reason? Protocol ossification. TLS was designed with flexibility in mind: the client sends a list of TLS versions it supports, so that the connection can be smoothly upgraded to the newest crypto. That’s the theory, but if you never move the joint, it rusts: for one, it turned out that a lot of server software and middleware simply crashed on just seeing an unknown version. Others would ignore the version number completely, and try to parse the messages as if it was TLS 1.2 anyway. In practice, the version negotiation turned out to be completely broken. So how was this fixed?
In revision 22 of the TLS 1.3 draft, changes were made to make TLS 1.3 look like TLS 1.2 on the wire: in particular TLS 1.3 advertises itself as TLS 1.2 with the normal version negotiation. Also, a lot of unnecessary fields are included in the TLS 1.3 ClientHello just to appease any broken middleboxes that might be peeking in. A server that doesn’t understand TLS 1.3 wouldn’t even see that an attempt was made to negotiate TLS 1.3. Using a sneaky new extension, a second version negotiation mechanism was added. For the details, check out the December 2017 blog post linked above.
Today TLS 1.3 is a huge success, and is used by more than 93% of the connections.
To help prevent ossification in the future, new protocols such as TLS 1.3 and QUIC use GREASE, where clients send unknown identifiers on purpose, including cryptographic algorithm identifiers, to help catch similar bugs, and keep the flexibility.
Migrating the Internet to post-quantum key agreement
Now that we understand what we’re dealing with on a high level, let’s dive into upgrading key agreement on the Internet. First, let’s have a closer look at NIST’s first and so far only post-quantum key agreement: ML-KEM.
ML-KEM was submitted under the name CRYTALS-Kyber. Even though it will be a US standard, its designers work in industry and academia across France, Switzerland, the Netherlands, Belgium, Germany, Canada, and the United States. Let’s have a look at its performance.
ML-KEM versus X25519
Today the vast majority of clients use the traditional key agreement X25519. Let’s compare that to ML-KEM.
Keyshares size(in bytes) | Ops/sec (higher is better) | ||||
---|---|---|---|---|---|
Algorithm | PQ | Client | Server | Client | Server |
ML-KEM-512 | ✅ | 800 | 768 | 45,000 | 70,000 |
ML-KEM-768 | ✅ | 1,184 | 1,088 | 29,000 | 45,000 |
ML-KEM-1024 | ✅ | 1,568 | 1,568 | 20,000 | 30,000 |
X25519 | ❌ | 32 | 32 | 19,000 | 19,000 |
ML-KEM-512, -768 and -1024 aim to be as resistant to (quantum) attack as AES-128, -192 and -256 respectively. Even at the AES-128 level, ML-KEM is much bigger than X25519, requiring 1,568 bytes over the wire, whereas X25519 requires a mere 64 bytes.
On the other hand, even ML-KEM-1024 is typically significantly faster than X25519, although this can vary quite a bit depending on your platform.
ML-KEM-768 and X25519
At Cloudflare, we are not taking advantage of that speed boost just yet. Like many other early adopters, we like to play it safe and deploy a hybrid key-agreement combining X25519 and (a preliminary version of) ML-KEM-768. This combination might surprise you for two reasons.
- Why combine X25519 (“128 bits of security”) with ML-KEM-768 (“192 bits of security”)?
- Why bother with the non post-quantum X25519?
The apparent security level mismatch is a hedge against improvements in cryptanalysis in lattice-based cryptography. There is a lot of trust in the (non post-quantum) security of X25519: matching AES-128 is more than enough. Although we are comfortable in the security of ML-KEM-512 today, over the coming decades cryptanalysis could improve. Thus, we’d like to keep a margin for now.
The inclusion of X25519 has two reasons. First, there is always a remote chance that a breakthrough renders all variants of ML-KEM insecure. In that case, X25519 still provides non post-quantum security, and our post-quantum migration didn’t make things worse.
More important is that we do not only worry about attacks on the algorithm, but also on the implementation. A noteworthy example where we dodged a bullet is that of KyberSlash, a timing attack that affected many implementations of Kyber (an earlier version of ML-KEM), including our own. Luckily KyberSlash does not affect Kyber as it is used in TLS. A similar implementation mistake that would actually affect TLS, is likely to require an active attacker. In that case, the likely aim of the attacker wouldn’t be to decrypt data decades down the line, but steal a cookie or other token, or inject a payload. Including X25519 prevents such an attack.
So how well do ML-KEM-768 and X25519 together perform in practice?
Performance and protocol ossification
Browser experiments
Being well aware of potential compatibility and performance issues, Google started a first experiment with post-quantum cryptography back in 2016, the same year NIST started their competition. This was followed up by a second larger joint experiment by Cloudflare and Google in 2018. We tested two different hybrid post-quantum key agreements: CECPQ2, which is a combination of the lattice-based NTRU-HRSS and X25519, and CECPQ2b, a combination of the isogeny-based SIKE and again X25519. NTRU-HRSS is very similar to ML-KEM in size, but is computationally somewhat more taxing on the client-side. SIKE on the other hand, has very small keys, is computationally very expensive, and was completely broken in 2022. With respect to TLS handshake times, X25519+NTRU-HRSS performed very well, being hard to distinguish by eye from the control connections.
Unfortunately, a small but significant fraction of clients experienced broken connections with NTRU-HRSS. The reason: the size of the NTRU-HRSS keyshares. In the past, when creating a TLS connection, the first message sent by the client, the so-called ClientHello, almost always fit within a single network packet. The TLS specification allows for a larger ClientHello, however no one really made use of that. Thus, protocol ossification strikes again as there are some middleboxes, load-balancers, and other software that tacitly assume the ClientHello always fits in a single packet.
Over the subsequent years, Chrome kept running their PQ experiment at a very low rate, and did a great job reaching out to vendors whose products were incompatible. If it were not for these compatibility issues, we would’ve likely seen Chrome ramp up post-quantum key agreement five years earlier.
Today the situation looks better. At the time of writing, Chrome has enabled post-quantum key-agreement for 10% of all users. That accounts for about 1.8% of all our TLS 1.3 connections, as shown in the figure below. That’s a lot, but we’re not out of the woods yet. There could well be performance and compatibility issues that prevent a further rollout.
Nonetheless, we feel it’s more probable than not that we will see Chrome enable post-quantum key agreement for more users this year.
Other browsers
In January 2024, Firefox landed the code to support post-quantum key agreement in nightly, and it’s likely it will land in Firefox proper later in 2024. For Chrome-derived browsers, such as Edge and Brave, it’s easy to piggyback on the work of Chrome, and we could well see them follow suit when Chrome turns on post-quantum key-agreement by default.
However, browser to server connections aren’t the only connections important to the Internet.
Testing connections to customer origins
In September 2023, we added support for our customers to enable post-quantum key agreement on connections from Cloudflare to their origins. That’s connection (3) in the following diagram. This can be done in two ways: the fast way, and the slow but safer way. In both cases, if the origin does not support it, we fall back to traditional key-agreement. We explain the details of these in the blog post, but in short, in the fast way we send the post-quantum keyshare immediately, and in the slow but safe way we let the origin ask for post-quantum using a HelloRetryRequest message. Chrome, by the way, is deploying post-quantum key agreement the fast way.
At the same time, we started regularly testing our customer origins to see if they would support us offering post-quantum key agreement. We found all origins supported the safe but slow method. The fast method didn’t fare as well, as we found that 0.34% of connections would break. That’s higher than the failure rates seen by browsers.
Unsurprisingly, many failures seem to be caused by the large ClientHello. Interestingly, the majority are caused by servers not correctly implementing HelloRetryRequest. To investigate the cause, we have reached out to customers to ascertain the cause. We’re very grateful to those that have responded, and we’re currently working through the data.
Outlook
As we’ve seen, post-quantum key agreement, despite protocol ossification, is relatively straightforward to deploy. We’re also on a great trajectory, as we might well see double-digit client support for post-quantum key agreement later this year.
Let’s turn to the second, more difficult migration.
Migrating the Internet to post-quantum signatures
Now, we’ll turn our attention to upgrading the signatures used on the Internet.
The zoo of post-quantum signatures
Let’s start by sizing up the post-quantum signatures we have available today at the AES-128 security level: ML-DSA-44, FN-DSA-512, and the two variants of SLH-DSA. As a comparison, we also include the venerable Ed25519 and RSA-2048 in wide use today, as well as a sample of five promising signature schemes from the signatures onramp.
Sizes (bytes) | CPU time (lower is better) | |||||
---|---|---|---|---|---|---|
PQ | Public key | Signature | Signing | Verification | ||
Standardized | Ed25519 | ❌ | 32 | 64 | 1 (baseline) | 1 (baseline) |
RSA-2048 | ❌ | 256 | 256 | 70 | 0.3 | |
NIST drafts | ML-DSA-44 | ✅ | 1,312 | 2,420 | 4.8 | 0.5 |
FN-DSA-512 | ✅ | 897 | 666 | 8 ⚠️ | 0.5 | |
SLH-DSA-128s | ✅ | 32 | 7,856 | 8,000 | 2.8 | |
SLH-DSA-128f | ✅ | 32 | 17,088 | 550 | 7 | |
Sample from signatures onramp | MAYOone | ✅ | 1,168 | 321 | 4.7 | 0.3 |
MAYOtwo | ✅ | 5,488 | 180 | 5 | 0.2 | |
SQISign I | ✅ | 64 | 177 | 60,000 | 500 | |
UOV Is-pkc | ✅ | 66,576 | 96 | 2.5 | 2 | |
HAWK512 | ✅ | 1,024 | 555 | 2 | 1 |
It is immediately clear that none of the post-quantum signature schemes comes even close to being a drop-in replacement for Ed25519 (which is comparable to ECDSA P-256) as most of the signatures are simply much bigger. The exceptions are SQISign, MAYO, and UOV from the onramp, but they’re far from ideal. MAYO and UOV have large public keys, and SQISign requires an immense amount of computation.
When to use SLH-DSA
As mentioned before, today we only have drafts for SLH-DSA and ML-DSA. In every relevant performance metric, ML-DSA beats SLH-DSA handily. (Even the small public keys of SLH-DSA are not any advantage. If you include the ML-DSA public key with its signature, it’s still smaller than an SLH-DSA signature, and in that case you can use the short hash of the ML-DSA public key as a short public key.)
The advantage of SLH-DSA is that there is a lot of trust in its security. To forge an SLH-DSA signature you need to break the underlying hash function quite badly. It is not enough to break the collision resistance of the hash, as has been done with SHA-1 and MD5. In fact, as of February 2024, an SHA-1 based SLH-DSA would still be considered secure. Of course, SLH-DSA does not use SHA-1, and instead uses SHA2 and SHA3, against which not a single practical attack is known.
If you can shoulder the cost, SLH-DSA has the best security guarantee, which might be crucial when dealing with long-lasting signatures, or deployments where upgrades are impossible.
Be careful with FN-DSA
Looking ahead a bit: the best of the worst seems to be FN-DSA-512. FN-DSA-512’s signatures and public key together are only 1,563 bytes, with somewhat reasonable signing time. FN-DSA has an achilles heel though — for acceptable signing performance, it requires fast floating-point arithmetic. Without it, signing is about 20 times slower. But speed is not enough, as the floating-point arithmetic has to run in constant time — without it, the FN-DSA private key can be recovered by timing signature creation. Writing safe FN-DSA implementations has turned out to be quite challenging, which makes FN-DSA dangerous when signatures are generated on the fly, such as in a TLS handshake. It is good to stress that this only affects signing. FN-DSA verification does not require floating-point arithmetic (and during verification there wouldn’t be a private key to leak anyway.)
There are many signatures on the web
The biggest pain-point of migrating the Internet to post-quantum signatures, is that there are a lot of signatures even in a single connection. When you visit this very website for the first time, we send six signatures and two public keys.
The majority of these are for the certificate chain: the CA signs the intermediate certificate, which signs the leaf certificate, which in turn signs the TLS transcript to prove the authenticity of the server. If you’re keeping count: we’re still three signatures short.
Two of these are for SCTs required for certificate transparency. Certificate transparency is a key, but lesser known, part of the Web PKI, the ecosystem that secures browser connections. Its goal is to publicly log every certificate issued, so that misissuances can be detected after the fact. It works by having independent parties run CT logs. Before issuing a certificate, a CA must first submit it to at least two different CT logs. An SCT is a signature of a CT log that acts as a proof, a receipt, that the certificate has been logged.
The final signature is an OCSP staple, which proves that the leaf certificate hasn’t been revoked in the last few days.
Tailoring signature schemes
There are two aspects of how a signature can be used that are worthwhile to highlight: whether the public key is included with the signature, and whether the signature is online or offline.
For the SCTs and the signature of the root on the intermediate, the public key is not transmitted during the handshake. Thus, for those, a signature scheme with smaller signatures but larger public keys, such as MAYO or UOV, would be particularly well-suited. For the other signatures, the public key is included, and it’s more important to minimize the sizes of the combined public key and signature.
The handshake signature is the only signature that is created online — all the other signatures are created ahead of time. The handshake signature is created and verified only once, whereas the other signatures are typically verified many times by different clients. This means that for the handshake signature, it’s advantageous to balance signing and verification time which are both in the hot path, whereas for the other signatures having better verification time at the cost of slower signing is worthwhile. This is one of the advantages RSA still enjoys over elliptic curve signatures today.
Putting together different signature schemes is a fun puzzle, but it also comes with some drawbacks. Using multiple different schemes increases the attack surface because an algorithmic or implementation vulnerability in one compromises the whole. Also, the whole ecosystem needs to implement and optimize multiple algorithms, which is a significant burden.
Putting it together
So, what are some reasonable combinations to try?
With NIST’s current picks
With the draft standards available today, we do not have a lot of options.
If we simply switch to ML-DSA-44 for all signatures, we’re adding 17kB of data that needs to be transmitted from the server to the client during the TLS handshake. Is that a lot? Probably. We will address that later on.
If we wait a bit and replace all but the handshake signature with FN-DSA-512, we’re looking at adding only 8kB. That’s much better, but I have to repeat that it’s difficult to implement FN-DSA-512 signing safely without timing side channels, and there is a good chance we’ll shoot ourselves in the foot if we’re not careful.
Another way to shoot ourselves in the foot today is with stateful hash-based signatures.
Stateful hash-based signatures
Apart from symmetric cryptography, there are already post-quantum signature schemes standardized today: LMS / HRSS and XMSS(MT). Just like SLH-DSA, these are hash-based signature schemes, and thus, algorithmically they’re very conservative.
But they come with a major drawback: you need to remember the state. What is this state? When generating a keypair, you prepare a fixed number of one-time-use slots, and you need to remember which one you’ve used. If you use the same prepared slot twice, then anyone can create a forgery with those two. Managing this state is not impossible, but quite tricky. What if the server was restored from a backup? The state can be distributed over multiple servers, but that changes the usual signature flow quite a bit, and it’s unclear whether regulators will allow this approach, as the state is typically considered part of the private key.
So, how do they perform? It’s hard to give a definite answer. These hash-based signature schemes have a lot of knobs to turn and can be fine-tuned to their use case. You can see for yourself, and play around with the parameters on this website. With standardized variants (with security parameter n=24) for the offline signatures, we can beat ML-DSA-44 in data on the wire, but can’t outperform FN-DSA-512. With security parameter n=16, which has not been standardized, stateful hash-based signatures are competitive with FN-DSA-512, and can even beat it on size. However, n=16 comes with yet another footgun: it allows the signer to create a single signature that validates two different messages — there is no non-repudiation.
All in all, FN-DSA-512 and stateful hash-based signatures tempt us with a similar and clear performance benefit over ML-DSA-44, but are difficult to use safely.
Signatures on the horizon
There are some very promising new signature schemes submitted to the NIST onramp.
UOV (unbalanced oil and vinegar) is an old multivariate scheme with a large public key (66.5kB), but small signatures (96 bytes). If we combine UOV for the root and SCTs with ML-DSA-44 for the others, we’re looking at only 10kB — close to FN-DSA-512.
Over the decades, there have been many attempts to add some structure to UOV public keys, to get a better balance between public key and signature size. Many of these so-called structured multivariate schemes, which includes Rainbow and GeMMS, unfortunately have been broken.
MAYO is the latest proposal for a structured multivariate scheme, designed by the cryptographer that broke Rainbow. As a structured multivariate scheme, its security requires careful scrutiny, but its utility (given it is not broken) is very appealing.
MAYO allows for a fine-grained tradeoff between signature and public key size. For the submission, to keep things simple, the authors proposed two concrete variants: MAYOone with balanced signature (321 bytes) and public key (1.1kB) sizes, and MAYOtwo that has signatures of 180 bytes, while keeping the public key manageable at 5.4kB. Verification times are excellent, while signing times are somewhat slower than ECDSA, but far better than RSA. Combining both variants in the obvious way, we’re only looking at 3.3kB.
Purely looking at sizes, SQISign I is the clear winner, even beating RSA-2048. Unfortunately, the computation required for signing, and crucially verification, are way too high. For niche applications, SQISign might be useful, but for general adoption verification times need to improve significantly, even if that requires a larger signature.
Finally, I would like to mention HAWK512. HAWK is a lattice-based scheme similar to FN-DSA-512, but does not require floating-point arithmetic. This makes HAWK an appealing alternative to FN-DSA. NIST has repeatedly stated that the main purpose of the onramp is to standardize a signature scheme that is not based on lattices — a description HAWK does not fit. We might see some innovations of HAWK be included in the final version of FN-DSA, but it is unclear whether that will solve all of FN-DSA implementation concerns.
There are more promising submissions in the onramp, but those discussed are a fairly representative sample of those interesting to TLS. For instance, SNOVA is similar to MAYO, and TUOV is similar to UOV. Explore the submissions for yourself on Thom’s webpage.
It will take 17kB extra to swap in ML-DSA-44. That’s a lot compared to the typical handshake today, but it’s not a lot compared to the JavaScript and images served on many web pages. The key point is that the change we must make here affects every single TLS connection, whether it’s used for a bloated website, or a time-critical API call. Also, it’s not just about waiting a bit longer. If you have spotty cellular reception, that extra data can make the difference between being able to load a page, and having the connection time out. (As an aside, talking about bloat: many apps perform a surprisingly high number of TLS handshakes.)
Just like with key agreement, performance isn’t our only concern: we also want the connection to succeed in the first place. Back in 2021, we ran an experiment artificially enlarging the certificate chain to simulate larger post-quantum certificates. We give a short summary of the key result below, but for the details, check out the full blog post.
Initially, we wanted to run the experiment on a small sample of regular traffic, in order to get unbiased data. Unfortunately, we found that large certificate chains broke some connections. Thus, to avoid breaking customer connections, we set up the experiment to use background connections launched from our challenge pages. For each participant, we launched two background connections: one with a larger certificate chain (live) and one with a normal chain(control). The graph on the right shows the number of control connections that are missing a corresponding live connection. There are jumps around 10kB and 30kB, suggesting that there are clients or middleboxes that break when certificate chains grow by more than 10kB or 30kB.
This does not mean that the ML-DSA-44-only route is necessarily unviable. Just like with key agreement, browsers can slowly turn on support for post-quantum certificates. As we hit issues with middleboxes, we can work with vendors to fix what is broken. It is crucial here that servers are configured to be able to serve either a small traditional chain, or a larger post-quantum chain.
These issues are problematic for a single-certificate migration strategy. In this approach, the server installs a single traditional certificate that contains a separate post-quantum certificate in a so-called non-critical extension. A client that does not support post-quantum certificates will ignore the extension. In this approach, installing the single certificate will immediately break all clients with compatibility issues, making it a non-starter.
What about performance? We saw the following impact on TLS handshake time.
The jump at around 40kB is caused by an extra round-trip due to a full congestion window. In the 2021 blog post we go into detail on what that is all about. There is an important caveat: at Cloudflare, because we’re close to the client, we use a larger congestion window. With a typical congestion window, the jump would move to around 10kB. Also, the jump would be larger as typical round-trip times are higher.
Thus, when adding 9KB, we're looking at a slowdown of about 15%. Crossing the 10kB boundary, we are likely to incur an extra roundtrip, which could well lead to a slowdown of more than 60%. That completely negates the much touted performance benefit that TLS 1.3 has over TLS 1.2, and it’s too high to be enabled by default.
Is 9kB too much? Enabling post-quantum key agreement wasn’t free either, but enabling post-quantum key agreement was cheaper and actually gets us a tangible security benefit today. However, this thinking is dangerous. If we wait too long before enabling post-quantum certificates by default, we might find ourselves out of time when the quantum computer arrives.
Way forward
Over the coming years, we’ll be working with browsers to test the viability and performance impact of post-quantum authentication in TLS. We expect to add support for post-quantum certificates as soon as they arrive (probably around 2026), but not enable them by default.
At the same time, we’re exploring various ideas to reduce the number of signatures.
Reducing number of signatures
Over the last few years, there have been several proposals to reduce the number of signatures used.
Leaving out intermediate certificates
CAs report the intermediate certificates they use in the CCADB. Most browsers ship with the list of intermediates (of CAs they trust). Using that list, a browser is able to establish a connection with a server that forgot to install the intermediate. If a server can leave out the intermediate, then why bother with it?
There are three competing proposals to leave out the intermediate certificate. The original 2019 proposal is by Martin Thomson, who suggests simply having the browser send a single bit to indicate that it has an up-to-date list of all intermediates. In that case, the server will leave out the intermediates. This will work well in the majority of cases, but could lead to some hard-to-debug issues in corner cases. For one, not all intermediates are listed in the CCADB, and these missing intermediates aren’t even from custom CAs. Another reason is that the browser could be mistaken about whether it’s up-to-date. A more esoteric issue is that the browser could reconstruct a different chain of certificates than the server had in mind.
To address these issues, in 2023, Dennis Jackson put forward a more robust proposal. In this proposal, every year a fixed list of intermediates is compiled from the CCADB. Instead of a single flag, the browser will send the named lists of intermediates it has. The server will not simply leave out matching intermediates, but rather replace them by the sequence number at which they appear in the list. He also did a survey of the most popular websites, and found that just by leaving out the intermediates today, we can save more than 2kB compared to certificate compression for half of them. That’s with today’s certificates: yes, X509 certificates are somewhat bloated.
Finally, there is the more general TLS trust expressions proposal that allows a browser to signal more in a more fine-grained manner which CAs and intermediates it trusts.
It’s likely some form of intermediate suppression will be adopted in the coming years. This will push the cost of a ML-DSA-44-only deployment down to less than 13kB.
KEMTLS
Another approach is to change TLS more rigorously by replacing the signature algorithm in the leaf certificate by a KEM. This is called KEMTLS (or AuthKEM at the IETF). The server proves it controls the leaf certificate, by being able to decrypt a challenge sent by the client. This is not an outlandishly new idea, as older versions of TLS would encrypt a shared key to an RSA certificate.
KEMTLS does add quite a bit of complexity to TLS 1.3, which was purposely designed to simplify TLS 1.2. Adding complexity adds security concerns, but we soften that by extending TLS 1.3 machine-checked security proof to KEMTLS. Nonetheless, adopting KEMTLS will be a significant engineering effort, and its gains should be worthwhile.
If we replace an ML-DSA-44 handshake signature of 2,420 bytes by KEMTLS using ML-KEM-512, we save 852 bytes in the total bytes transmitted by client and server. Looking just at the server, we save 1,620 bytes. If that’s 1.6kB saved on 17kB, it’s not very impressive. Also, KEMTLS is of little benefit if small post-quantum signatures such as MAYOone are available for the handshake.
KEMTLS shines in the case that 1.6kB savings pushes the server within the congestion window, such as when UOV is used for all but the handshake and leaf signature. Another advantage of KEMTLS, especially for embedded devices, is that it could reduce the number of algorithms that need to be implemented: you need a KEM for the key agreement anyway, and that could replace the signature scheme you would’ve only used for the handshake signature.
At the moment, deploying KEMTLS isn’t the lowest hanging fruit, but it could well come into its own, depending on which signature schemes are standardized, and which other protocol changes are made.
Merkle tree certificates
An even more ambitious and involved proposal is Merkle tree certificates (MTC). In this proposal, all signatures except the handshake signature are replaced by a short <800 byte Merkle tree certificate. This sounds too good to be true, and there is indeed a catch. MTC doesn’t work in all situations, and for those you will need to fall back to old-fashioned X509 certificates and certificate transparency. So, what’s assumed?
- No direct certificate issuance. You can’t get a Merkle tree certificate immediately: you will have to ask for one, and then wait for at least a day before you can use it.
- Clients (in MTC parlance relying parties) can only check a Merkle tree certificate if they stay up to date with a transparency service. Browsers have an update-mechanism that can be used for this, but a browser that hasn’t been used in a while might be stale.
MTC should be seen as an optimisation for the vast majority of cases.
Summary
So, how does it actually work? I’ll try to give a short summary — for a longer introduction check out David Benjamin’s IETF presentation, or get your hands dirty by setting up your own MTC CA.
In MTC, CAs issues assertions in a batch in a fixed rhythm. Say once every hour. An example of an assertion is “you can trust P-256 public key ab....23 when connecting to example.com”. Basically an assertion is a certificate without the signature. If a subscriber wants to get a certificate, it sends the assertion to the CA, which vets it, and then queues it for issuance.
On this batch of assertions, the CA computes a Merkle tree. We have an explainer of Merkle trees in our blog post introducing certificate transparency. The short of it is that you can summarize a batch into a single hash by creating a tree hashing pairwise. The root is the summary. The nice thing about Merkle trees is that you can prove that something was in the batch to someone who only has the root, by revealing just a few hashes up the tree, which is called the Merkle tree certificate.
Each assertion is valid for a fixed number of batches — say 336 batches for a validity of two weeks. This is called the validity window. When issuing a batch, the CA not only publishes the assertions, but also a signature on the roots of all batches that are currently valid, called the signed validity window.
After the MTC CA has issued the new batch, the subscriber that asked for the certificate to be issued can pull the Merkle tree certificate from the CA. The subscriber can then install it, next to its X509 certificate, but will have to wait a bit before it’s useful.
Every hour, the transparency services, including those run by browser vendors, pull the new assertions and signed validity window from the CAs they trust. They check whether everything is consistent, including whether the new signed validity window matches with the old one. When satisfied, they republish the batches and signed validity window themselves.
Every hour, browsers download the latest roots from their trusted transparency service. Now, when connecting to a server, the client will essentially advertise which CAs it trusts, and the sequence number of the latest batch for which it has the roots. The server can then send either a new MTC, an older MTC (if the client is a bit stale), or fall back to a X509 certificate.
Outlook
The path for migrating the Internet to post-quantum authentication is much less clear than with key agreement. In the short term, we expect early adoption of post-quantum authentication across the Internet around 2026, but few will turn it on by default. Unless we can get performance much closer to today’s authentication, we expect the vast majority to keep post-quantum authentication disabled, unless motivated by regulation.
Not just TLS, authentication, and key agreement
Despite its length, in this blog post, we have only really touched upon migrating TLS. And even TLS we did not cover completely, as we have not discussed Encrypted ClientHello (we didn’t forget about it). Although important, TLS is not the only protocol key to the security of the Internet. We want to briefly mention a few other challenges, but cannot go into detail. One particular challenge is DNSSEC, which is responsible for securing the resolution of domain names.
Although key agreement and signatures are the most widely used cryptographic primitives, over the last few years we have seen the adoption of more esoteric cryptography to serve more advanced use cases, such as unlinkable tokens with Privacy Pass / PAT, anonymous credentials, and attribute based encryption to name a few. For most of these advanced cryptographic schemes, there is no known practical post-quantum alternative yet.
What you can do today
To finish, let’s review what you can do today. For most organizations the brunt of the work is in the preparation. Where is cryptography used in the first place? What software libraries / what hardware? What are the timelines of your vendors? Do you need to hire expertise? What’s at risk, and how should it be prioritized? Even before you can answer all those, create engagement within the organization. All this work can be started before NIST finishes their standards or software starts shipping with post-quantum cryptography.
You can also start testing right now since the performance characteristics of the final standards will not be meaningfully different from the preliminary ones available today. If it works with the preliminary ones today in your test environment, the final standards will most likely work just fine in production. We’ve collected a list of software and forks that already support preliminary post-quantum key agreement here.
Also on that page, we collected instructions on how to turn on post-quantum key agreement in your browser today. (For Chrome it’s enable-tls13-kyber
in chrome://flags
.)
If you’re a Cloudflare customer, you can check out how to enable post-quantum key agreement to your origin, and our products that are secured against store-now/decrypt-later today.
Good luck with your migration, and if you hit any issues, do reach out: [email protected]
We protect entire corporate networks, help customers build Internet-scale applications efficiently, accelerate any website or Internet application, ward off DDoS attacks, keep hackers at bay, and can help you on your journey to Zero Trust.
Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.
To learn more about our mission to help build a better Internet, start here. If you're looking for a new career direction, check out our open positions.