How New Headless Chrome & the CDP Signal Are Impacting Bot Detection
2024-6-14 02:14:9 Author: securityboulevard.com(查看原文) 阅读量:9 收藏

Author’s Note: The detection methods and ideas noted in this piece are based on research performed by Eloi Bahuet, a threat researcher at DataDome. Thank you for sharing your expertise!

Headless Chrome’s latest update has brought it dangerously close to achieving a perfect browser fingerprint. In its early days, Headless Chrome was distinctly separate from the “headful” Chrome, filled with its own quirks and bugs. But now, with both the new Headless Chrome and Chrome sharing the same codebase, distinguishing between the two has become a Herculean taskas they have almost the exact same browser fingerprint.

Once an attacker uses page.setUserAgent() to change their user agent and the --disable-blink-features=AutomationControlled argument to get rid of navigator.webdriver, there are very few inconsistencies left in the fingerprint of Headless Chrome. It’s like trying to find a needle in a haystack, but the needle looks just like the hay!

This evolution significantly raises the bar for detection, prompting the need to understand and leverage new techniques such as detecting the Chrome DevTools Protocol (CDP) side effects to stay ahead of sophisticated bot frameworks.

What is CDP?

The Chrome DevTools Protocol (CDP) is a set of APIs and tools that enables developers to interact programmatically with Chromium-based browsers. It allows for debugging, profiling, and inspecting web applications by providing access to the browser’s internals. CDP is also the underlying protocol used by the main bot frameworks—such as Puppeteer, Playwright, and Selenium—to instrument Chromium-based browsers. Thus, being able to detect that a browser is instrumented with CDP is key to detect most modern bot frameworks.

CDP detection targets the underlying technology used for automation rather than specific inconsistencies and side effects added by a particular bot framework. This provides us a more generic fingerprinting detection, even for unknown automation frameworks—including the ones that try to stay under the radar by providing anti-detect features.

In addition to being able to detect all kinds of bot frameworks, CDP detection is efficient both for Chrome and Headless Chrome.

How can we detect browsers automated with CDP?

The detection technique we present below leverages the fact that the automated browser and the automation framework need to serialize data when they communicate with WebSocket.

We want to create a JavaScript (JS) function that enables us to observe a situation where some data is serialized only when a browser is automated using CDP. Thus, we need to:

  1. Provoke the serialization of an object, but only when CDP is being used.
  2. Detect that an object has been serialized.

Provoking Object Serialization

To provoke object serialization we leverage the Runtime.consoleAPICalled event, which relays the information that was logged using one of the window.console methods.

Note: Chromium-based browsers dispatch Runtime events only when they have received the Runtime.enable command from the client, which is the case of automation frameworks.

To restrict the serialization only to situations where CDP is being used, we leverage the fact that Chrome buffers the console messages when the DevTools (CDP) are not open.

Detecting Object Serialization

To detect that an object has been serialized, we could have defined a JS getter on a random JS object and used a console method on it. However, in this situation, Chrome executes the getter instantly and caches its result—without waiting for a serialization.

There is one exception to this rule: the Error object’s stack property. This is a non-standard property, which means that browser engines are free to implement it how they see fit. V8, the engine used by Chrome, handles it as follows:

Unlike Java where the stack trace of an exception is a structured value that allows inspection of the stack state, the stack property in V8 just holds a flat string containing the formatted stack trace. This is for no other reason than compatibility with other browsers. However, this is not hardcoded but only the default behavior and can be overridden by user scripts.

For efficiency stack traces are not formatted when they are captured but on demand, the first time the stack property is accessed.

In Chrome, the stack property of an Error can be overridden, ensuring its value is not read until it’s needed.

Creating the Function

Now that both of our conditions are met, we can create a JavaScript function that detects when the stack property of an error is accessed:

 var detected = false;
 var e = new Error();
 Object.defineProperty(e, 'stack', {
    get() {
        detected = true;
    }
 });
 console.log(e);

// store value of `detected`

The value for “detected” will be true on Chromium browsers that have a CDP client connected (and that has sent the Runtime.enable command), and false otherwise.

While this technique detects automated Chromium-based bots, it also detects users with DevTools open, which may create false positives. In theory, detecting if the DevTools UI is open could have helped handle these edge cases. However, bot developers noticed that certain anti-bot software vendors added a rule to check whether the DevTools UI was open—so they began to automatically start their bots with args: ['--auto-open-devtools-for-tabs'].

A forum post showing a bot developer automatically opening the DevTools UI


文章来源: https://securityboulevard.com/2024/06/how-new-headless-chrome-the-cdp-signal-are-impacting-bot-detection/
如有侵权请联系:admin#unsafe.sh