How DataDome Detects Puppeteer Extra Stealth
2024-7-6 02:30:2 Author: securityboulevard.com(查看原文) 阅读量:6 收藏

What is Puppeteer Extra Stealth?

Puppeteer Extra Stealth is a plugin that enhances Puppeteer with advanced stealth features, making its detection more difficult. It achieves this by masking various browser characteristics commonly used to identify bots—such as modifying JavaScript global properties and mimicking human-like behavior.

With over 6,000 stars on GitHub, it has been one of the most popular tools for creating bots. Today, we are sharing one of DataDome’s long-standing methods for detecting Extra Stealth.

How DataDome Detects Extra Stealth

The Extra Stealth library consists of several modules known as “evasions”, each designed to circumvent a specific detection vector. Today, we will focus on the evasion targeting iframe global window objects, known as iframe.contentWindow.

The creation of iframes initiates a new JavaScript execution context, a complex browser process that—when done by automated browsers—often leaves detectable inconsistencies.

Some of these leaks are simple, such as the presence of properties on the iframe window object, even after removal from the top window. Others are more complex, like the unexpected presence of dangling objects after an iframe removal, or the ability to execute JS code in the iframe’s context before the native API used by Puppeteer. Extra Stealth’s iframe.contentWindow evasion was created to mask these leaks and prevent detection.

DataDome is not fooled by it. Not only is Puppeteer detected regardless (through other undisclosed means), but we are also able to detect the usage of the Extra Stealth evasion itself.

This client-side detection method executes JavaScript in the browser and sends the results to DataDome for evaluation. The code is only a few lines long:

// Create an iframe

let iframe = document.createElement('iframe');
// Set its "srcdoc" property to any value
iframe.srcdoc = 'datadome';
// Insert the iframe into the DOM
document.body.appendChild(iframe);
// Detect the evasion
let detected = iframe.contentWindow.self.get?.toString();

When this code is executed on normal browsers, the detected variable will be empty.

However, when run on by browsers automated with Puppeteer Extra Stealth, it will have a very distinctive value:

PuppeteerExtraStealth_SourceCode

This is part of Extra Stealth’s actual source code, complete with comments from its developers. Due to a flaw in the evasion, our client-side detection components can access and reveal its internal code.

From a detection standpoint, this result is unequivocal: upon detecting this value, DataDome can easily and accurately block the bot.

Additionally, since Extra Stealth’s source code varies between versions, the string value can also identify the specific version of Extra Stealth being used. This can serve as a signature component to assess the scope of the botting operation.

Detailed Analysis of the Detection Method

The reason the above check works is due to a subtle mistake in Extra Stealth’s source code. The iframe.contentWindow evasion:

  • Overrides several browser APIs to orchestrate an instrumentation that will trigger when an iframe is created.
  • When triggered, it replaces the iframe’s global object with a mock using the JavaScript Proxy API, which allows intercepting and redefining fundamental operations for another object.

The flaw lies in the Proxy’s handler, which defines the interception behavior. Here’s the Extra Stealth source code that we saw earlier, now formatted:

 const contentWindowProxy = {
    get(target, key) {
    // Now to the interesting part:
    // We actually make this thing behave like a regular iframe window,
    // by intercepting calls to e.g. `.self` and redirect it to the correct thing. :)
    // That makes it possible for these assertions to be correct:
    // iframe.contentWindow.self === window.top // must be false
    if (key === 'self') {
        return this
    }
    // iframe.contentWindow.frameElement === iframe // must be true
    if (key === 'frameElement') {
        return iframe
    }
    // Intercept iframe.contentWindow[0] to hide the property 0 added by the proxy.
    if (key === '0') {
        return undefined
    }
    return Reflect.get(target, key)
    }
}

// …
const proxy = new Proxy(window, contentWindowProxy)

The contentWindowProxy variable contains the Javascript Proxy handler. Here’s the point of interest:

// Now to the interesting part:
// We actually make this thing behave like a regular iframe window,
// by intercepting calls to e.g. `.self` and redirect it to the correct thing. :)
// That makes it possible for these assertions to be correct:
// iframe.contentWindow.self === window.top // must be false
if (key === 'self') {
return this
}

The author intended to link the window.self = window self-reference, as noted in their comments. However, at that point in the execution, the this keyword does not point to the mocked window object, but to the proxy handler object itself.

They should have returned the target parameter from the get() trap method to successfully close the window.self loop. By using the this keyword, a path that exposes the handler object’s contents was inadvertently opened.

When reading from iframe.contentWindow.self.get, we go through the mocked window Proxy to the self getter, which leads back to the handler, and finally to the .get property, revealing the trap’s logic.

Conclusion

At DataDome, we continually develop similar specialized challenges for detecting various headless browsers and anti-detect automation frameworks. Thus, with a single JavaScript execution, we can detect the subtle side effects these tools introduce, ensuring robust protection against automated traffic.

To learn more about how DataDome can protect your business against the most sophisticated bots and online fraud, try it for free or book a demo today.


文章来源: https://securityboulevard.com/2024/07/how-datadome-detects-puppeteer-extra-stealth/
如有侵权请联系:admin#unsafe.sh