Computer vision AI VTI’s against Phishing

Computer vision AI VTI’s against Phishing
VMRay 2025.3版本引入了计算机视觉技术以增强云环境中的钓鱼检测能力。通过分析网页的视觉元素（如登录表单和品牌标志），该技术能够识别钓鱼攻击，即使这些攻击使用了不同的DOM结构或混淆技术。这使得检测更加鲁棒，并减少了漏检的可能性。该功能目前在云环境中可用，并计划在2025.4版本中扩展到本地部署。 2025-10-8 06:14:21 Author: www.vmray.com(查看原文) 阅读量:39 收藏

Overview

As announced In our VMRay 2025.3 Release highlights blogpost, our phishing detections on cloud are now powered with Computer Vision capabilities. This allows VMRay’s threat identifiers ( VTIs ) to detect brands and page structures based on how they appear to the end user, which makes them more resilient for source code obfuscation and other evasion techniques employed in phishing threats. In the blog-post we will take a close look into this new feature, how it is utilized and its impact on the detections on different phishing threats observed in the wild.

Customer data privacy

Before we dive into the details of this new feature, it’s important to acknowledge the significance of data privacy when it comes to AI related technologies. At VMRay we continue our commitment to our customers’ data privacy and security: no customer data is used during the training and testing processes of the Computer Vision models. When using VMRay Platform, Analysis artifacts produced by users’ submissions are processed by our Computer Vision models hosted on VMRay’s own infrastructure. This process does not involve any third party services which ensures that the data is processed securely and transparently.

Background: The challenge with phishing detection

Phishing attacks primarily target end users, exploiting trust and familiarity more than technical vulnerabilities. This includes utilizing emails and websites that mimic legitimate ones, often with near-identical branding and layouts, while hiding the malicious intent beneath the apparently trustworthy content; VMRay’s VTIs extracts and combines features from the analyzed threat, including the DOM structure of the page itself in order to detect phishing attempts. However, given the fact that web development is quite flexible, there are a lot of different ways where you can create web pages with the exact same appearance, but with very different DOM structures. For example, a very simple logon form can have quite different DOM trees even after de-obfuscation, with threat actors continuously creating more variants of web pages that in the end looks almost identical: keeping up with new phishing campaigns has proven to be both time and resource consuming. In the example below we can see that a minimal login form can be implemented in different ways: with the three provided examples we can see there are very limited similarities between the DOM of each page

These kind of variations ( on a much larger scale ) are often seen in the wild: for example, Microsoft logon masquerade has been employed in different phish kits such as Mamba2FA and Sneaky2FA but with different implementations, making them more evasive; even though the pages are different code wise, they are extremely similar to the human eye; The increased rate of continuously evolving variation required tackling the problem from a different angle.

Introducing Computer Vision to VMRay platform

In order to keep up with the fast changes employed in phishing attacks, VMRay introduced a Computer Vision object detection model to identify web page elements based on the screenshots captured by VMRay platform during web analysis, such elements include buttons, logon forms, as well as branding images of organizations commonly seen in phishing attempts. The model is trained by our data experts on real, handpicked phishing threats.

High quality data is the backbone of effective vision model training as the models learn to interpret their input through the images they’re fed during training; so if the screenshots they are trained on do not represent real threats, the model’s predictions become unreliable. In short, better data means a smarter, safer, and more reliable vision system. The vision model is specially trained to recognize:

Pages containing logon forms
Pages matching the appearance of logon forms from popular brands
Branding logos of popular brands targeted by phishing threats

We’ve specifically chosen to use a vision model and not opt for the currently popular LLVM approach as our model is very specific to the web objects classifications task, it does not require additional LLM capabilities which translates to more processing time and increased cost; the combination of the Computer vision models and VTIs results in both seeing and deeply analyzing the content of the web page efficiently without noticeable performance regressions. The Computer Vision service was initially rolled to cloud environments due to its simpler implementation, while preparing the on-prem release is planned with the upcoming 2025.4 release in parallel.

Empowering VTIs (VMRay Threat Identifiers) with Computer Vision

A Core part of our Phishing detection is a Heuristic VTI that combines different features extracted from other VTIs analysing the phishing threat including – but not limited to:

URL features such as domains, TLDs, encoded data and obfuscation
Page DOM Structures like input fields, forms and text
Branding abuse to masquerade as trusted services
Infrastructure abuse such as usage of temporary web hosters and file sharing services
Network features including request and response content, protocols and ports

Now it is also empowered with Computer Vision predictions extracting visual features from the page, allowing the Phishing Heuristic VTI to recognize more suspicious logon forms and popular brands used on the web page regardless of how the page was implemented. Combining both visual and machine level features allows us to bridge the detection gaps caused by obfuscation and non standard branding images used to make the page appear similar to the original brand, without being an exact replica, which results in more resilient phishing detection.

Figure 3: Computer vision VTIs provides more feature for VMRay phishing heuristics

With the integration of Computer vision, we have observed ~6% more phishing detections on VMRay cloud that were not previously detected with the heuristics VTIs.

Figure 4: Detection gaps addressed with computer vision

Below are examples for emerging phishing attacks that were successfully detected due to Computer vision features contributions to the Phishing Heuristic VTI since the feature’s release to cloud.

Figure 5: Logon forms and brands detected by Computer vision, and contributing the the phishing heuristic

Additional reports:

Swiss-pass: https://www.vmray.com/analyses/swiss-pass-branded-phishing

AT&T: https://www.vmray.com/analyses/at-and-t-branded-phishing

Outlook: https://www.vmray.com/analyses/outlook-branded-phishing

Conclusion

Phishing threats are evolving rapidly, often mimicking trusted brands like Microsoft, Facebook, and other major financial brands while also employing different techniques to evade detection including obfuscation, encrypted payloads and non standard development methods. In version 2025.3 Computer Vision object detection model was introduced to the VMRay platform to empower phishing detection by providing VTIs with visual features extracted from the screenshots captured by the platform during web analysis: analyzing threats, both on code level as well as visual level, results in more resilient phishing detection and minimizes detection gaps. The feature will also be available soon to on-prem customers with the upcoming 2025.4 release.

文章来源: https://www.vmray.com/computer-vision-ai-vtis-against-phishing/
如有侵权请联系:admin#unsafe.sh