I recently stumbled upon a strange behavior in my Firefox: I visited an HTTPS-enabled website that I had visited before and saw that my Firefox connected insecurely via HTTP. I found that strange because nowadays, most websites set the HSTS header, which is supposed to force the browser to connect via HTTPS. I checked whether this website set the HSTS header – and it did. This means my Firefox was ignoring/forgetting about the HSTS header right after my visit.
For debugging, I disabled all of my add-ons (no effect), tried a different browser (HSTS worked), and set up a new Firefox profile (HSTS worked). Something was wrong with my Firefox profile. After some further debugging and searching the Internet, I found this blogpost. It describes an attack on the HSTS caches of Firefox and Chrome by filling them with bogus entries. While Chrome gets DoS-ed by this attack, Firefox silently disables HSTS as soon as 1024 HSTS cache entries are stored in a file called SiteSecurityServiceState.txt.
However, I did not get attacked. My typical daily surfing activity filled up my SiteSecurityServiceState.txt to the limit, and no new entry could be added to it. A quick look into the file showed that a lot of entries were occupied by domains of CDNs (for example, apis.google.com, cdn.cookielaw.org, cdn.jsdelir.net, cdnjs.cloudflare.com). Each of these domains occupied multiple entries, each entry with a different partitionKey. The partitionKey is used to identify what website caused the connections to this domain. For example, if insinuator.net includes an image from ernw.de, we will get an HSTS cache entry for ernw.de with partitionKey=insinuator.net. If troopers.de also includes an image from ernw.de, we will get an additional entry for ernw.de with partitionKey=troopers.de.
This partitionKey was introduced with Cache Partitioning in Firefox 85.0 on January 26th, 2021, to prevent user tracking via certain effects of shared caches. Before that version, each domain had only one entry in the cache. Since then, it seems the HSTS cache size has been “exploding”, quickly reaching the size constraint of 1024 entries.
A quick survey within ERNW showed, that almost every regular user of Firefox had already reached the size limit. Those who had not yet reached the size limit were new colleagues with relatively new profiles. But after just a few months of using these profiles, the cache was already filled to ~ 80%.
Because this behavior is unexpected for both users and website admins and disables HSTS for users that use Firefox daily, I opened a bug report in the Firefox bug tracker (Bug 1701192: Size limit of SiteSecurityServiceState.txt and Firefox cache partitioning make HSTS unreliable), which is still unresolved (at release of this blog post on May, 6th 2021).
Working on the Firefox HSTS cache showed that there are useful information for forensic analysis and incident response: What domains were contacted and what website caused this connection. Indirectly, it is possible to compute the time of the last visit of a website. However, to do so, we have to make some assumptions. We can also tell that a domain was contacted if the browsing history was cleared but not the site preferences.
Afterwards, I looked into the Chrome HSTS cache to see what data can be found here. Surprisingly, it is not (easily) possible to find the names of visited domains, because they are stored hashed (using SHA256). Only if you know or guess the name of a domain, you can compute the hash to see whether it has an entry in the cache or not. However, an advantage of this cache format is that it stores the timestamp of the last visit explicitly so that we do not have to compute it based on assumptions. There is no such thing as cache partitioning.
Another player on the browser market is Safari. So, I got a MacBook and created some datasets. In this cache format (stored as plist file) we can only find out when a domain was contacted for the first time because the timestamps are not updated after the first visit. (This might violate RFC 6797, but I did not have a closer look into this behavior). Additionally, the timestamps are off by 11323 days (to 1970-01-01).
Furthermore, I had quick looks at the HSTS cache formats of wget, curl and libsoup. They are quite easy to understand and do not need a deep analysis.
As a result of the analysis of the different HSTS caches, I created a little Python-based CLI tool for parsing the caches, including a documentation of the different formats. This tool creates CSV output that can be consumed easily by other software. Furthermore, it can use browsing histories and other domain lists as rainbow tables to find the original domain names of entries in the Chrome HSTS cache and can retrieve the actual HSTS header as currently returned by a website.
Code and documentation can be found at https://github.com/ernw/forensic-hsts-analyzer.