The Art of Bot Detection: How DataDome Uses Picasso for Device Class Fingerprinting

The Art of Bot Detection: How DataDome Uses Picasso for Device Class Fingerprinting
2024-2-9 01:40:21 Author: securityboulevard.com(查看原文) 阅读量:8 收藏

Catching sophisticated bots requires all kinds of signals—from behavioral signals, to proxy detection, to client-side fingerprints.

Indeed, as sophisticated bots leverage proxies, mimic human behavior, and attempt to forge several fingerprinting attributes, it’s important to have redundancy and exhaustivity in the signals collected to ensure all bots are detected.

When it comes to client-side browser fingerprinting, DataDome collects signals in 3 different ways:

In our JavaScript tag: a JS agent running in the background on the page of a site.
When we respond to a request with a CAPTCHA.
When we respond to a request with DataDome Device Check.

In these various components, we collect different kinds of browser fingerprint signals discussed in other blog posts, ranging from:

Information about the browser, the OS, and the device—such as browser version, number of CPU cores, device memory, type of GPU, etc.
Specially-crafted challenges that aim to detect side effects introduced by anti-bot-detection frameworks and headless/automated browsers.

While some APIs provide information about the OS and the environment the browser is running on, bot developers often modify these values to appear more human. Thus, a bot running on a Linux virtual machine may lie about its OS to pretend it’s running on a Windows machine. They may not even lie about the OS string alone, but about other attributes you’d expect to go with a particular OS, such as the type of GPU.

To avoid relying on static APIs returning information about the OS, researchers have come up with ways to ask a browser to execute a JS challenge, which can help determine the nature of its environment. When these tests aim to detect virtual machines, they’re named red pills (in reference to The Matrix movie).

In this blog post, we present how DataDome leverages Picasso, an approach originally conceived by Google, in our CAPTCHA and Device Check to detect bots lying about their environment.

What is Picasso?

Picasso is a device class fingerprinting protocol that enables a server to verify whether or not a device is lying about its browser, OS, or its environment in general.

Usually, when we refer to an approach like browser fingerprinting, a fingerprint is a combination of attributes that is—more or less—unique and stable, and can help identify an individual. In the case of device fingerprinting, the goal is not to identify a single individual, but instead to identify a class of devices. In the case of Picasso, we aim to identify classes defined by the nature of their browser (Chrome, Firefox, Safari) and their OS (Windows, Linux, Mac, iOS, Android).

To do that, Picasso leverages the HTML canvas API, and in particular the graphic rendering system (GPU). The server sends a proof of work challenge to an untrusted user whom we want to verify the nature of the device. The Picasso challenge then captures the entropy induced by a device’s underlying hardware.

The reason Picasso succeeds in identifying the type of OS and browser lies in the incidental yet stable pixel rendering differences across devices, due to their inherent features—both physical (graphical hardware) and software (graphical drivers, operating system)—which makes this type of fingerprinting possible (Figure 1).

In other words, the output of a web browser graphics, such as HTML5 canvas, depends on different layers, from hardware (GPU), to lower level software (GPU driver, OS rendering), to higher level software (browser and library provided graphics API). This makes an HTML5 canvas output—for an exact same set of instructions—highly unique per OS/browser, and allows accurate differentiation between them (Figure 2).

The Art of Bot Detection: How DataDome Uses Picasso for Device Class Fingerprinting

Figure 1: Visualization of the rendering differences between an emulated and real iOS device using the same software stack. Indicated in red are the per-pixel Picasso render differences. (from Google’s 2016 “Picasso: Lightweight Device Class Fingerprinting for Web Clients“)

A visualization of the rendering differences between browsers.

Figure 2: Visualization of the rendering differences between the same Picasso challenge for various browsers. Indicated in red are the per-pixel differences between each browser pair. (from Google’s 2016 “Picasso: Lightweight Device Class Fingerprinting for Web Clients“)

文章来源: https://securityboulevard.com/2024/02/the-art-of-bot-detection-how-datadome-uses-picasso-for-device-class-fingerprinting/
如有侵权请联系:admin#unsafe.sh