As announced In our VMRay 2025.3 Release highlights blogpost, our phishing detections on cloud are now powered with Computer Vision capabilities. This allows VMRay’s threat identifiers ( VTIs ) to detect brands and page structures based on how they appear to the end user, which makes them more resilient for source code obfuscation and other evasion techniques employed in phishing threats. In the blog-post we will take a close look into this new feature, how it is utilized and its impact on the detections on different phishing threats observed in the wild.
Before we dive into the details of this new feature, it’s important to acknowledge the significance of data privacy when it comes to AI related technologies. At VMRay we continue our commitment to our customers’ data privacy and security: no customer data is used during the training and testing processes of the Computer Vision models. When using VMRay Platform, Analysis artifacts produced by users’ submissions are processed by our Computer Vision models hosted on VMRay’s own infrastructure. This process does not involve any third party services which ensures that the data is processed securely and transparently.
Phishing attacks primarily target end users, exploiting trust and familiarity more than technical vulnerabilities. This includes utilizing emails and websites that mimic legitimate ones, often with near-identical branding and layouts, while hiding the malicious intent beneath the apparently trustworthy content; VMRay’s VTIs extracts and combines features from the analyzed threat, including the DOM structure of the page itself in order to detect phishing attempts. However, given the fact that web development is quite flexible, there are a lot of different ways where you can create web pages with the exact same appearance, but with very different DOM structures. For example, a very simple logon form can have quite different DOM trees even after de-obfuscation, with threat actors continuously creating more variants of web pages that in the end looks almost identical: keeping up with new phishing campaigns has proven to be both time and resource consuming. In the example below we can see that a minimal login form can be implemented in different ways: with the three provided examples we can see there are very limited similarities between the DOM of each page
These kind of variations ( on a much larger scale ) are often seen in the wild: for example, Microsoft logon masquerade has been employed in different phish kits such as Mamba2FA and Sneaky2FA but with different implementations, making them more evasive; even though the pages are different code wise, they are extremely similar to the human eye; The increased rate of continuously evolving variation required tackling the problem from a different angle.
In order to keep up with the fast changes employed in phishing attacks, VMRay introduced a Computer Vision object detection model to identify web page elements based on the screenshots captured by VMRay platform during web analysis, such elements include buttons, logon forms, as well as branding images of organizations commonly seen in phishing attempts. The model is trained by our data experts on real, handpicked phishing threats.
High quality data is the backbone of effective vision model training as the models learn to interpret their input through the images they’re fed during training; so if the screenshots they are trained on do not represent real threats, the model’s predictions become unreliable. In short, better data means a smarter, safer, and more reliable vision system. The vision model is specially trained to recognize:
We’ve specifically chosen to use a vision model and not opt for the currently popular LLVM approach as our model is very specific to the web objects classifications task, it does not require additional LLM capabilities which translates to more processing time and increased cost; the combination of the Computer vision models and VTIs results in both seeing and deeply analyzing the content of the web page efficiently without noticeable performance regressions. The Computer Vision service was initially rolled to cloud environments due to its simpler implementation, while preparing the on-prem release is planned with the upcoming 2025.4 release in parallel.
A Core part of our Phishing detection is a Heuristic VTI that combines different features extracted from other VTIs analysing the phishing threat including – but not limited to:
Now it is also empowered with Computer Vision predictions extracting visual features from the page, allowing the Phishing Heuristic VTI to recognize more suspicious logon forms and popular brands used on the web page regardless of how the page was implemented. Combining both visual and machine level features allows us to bridge the detection gaps caused by obfuscation and non standard branding images used to make the page appear similar to the original brand, without being an exact replica, which results in more resilient phishing detection.
With the integration of Computer vision, we have observed ~6% more phishing detections on VMRay cloud that were not previously detected with the heuristics VTIs.
Below are examples for emerging phishing attacks that were successfully detected due to Computer vision features contributions to the Phishing Heuristic VTI since the feature’s release to cloud.
Figure 5: Logon forms and brands detected by Computer vision, and contributing the the phishing heuristic
Swiss-pass: https://www.vmray.com/analyses/swiss-pass-branded-phishing
AT&T: https://www.vmray.com/analyses/at-and-t-branded-phishing
Outlook: https://www.vmray.com/analyses/outlook-branded-phishing
Phishing threats are evolving rapidly, often mimicking trusted brands like Microsoft, Facebook, and other major financial brands while also employing different techniques to evade detection including obfuscation, encrypted payloads and non standard development methods. In version 2025.3 Computer Vision object detection model was introduced to the VMRay platform to empower phishing detection by providing VTIs with visual features extracted from the screenshots captured by the platform during web analysis: analyzing threats, both on code level as well as visual level, results in more resilient phishing detection and minimizes detection gaps. The feature will also be available soon to on-prem customers with the upcoming 2025.4 release.