对Code Insight的学习笔记
2023-4-25 10:4:40 Author: OnionSec(查看原文) 阅读量:49 收藏

ChatGPT可以根据用户的输入生成符合用户要求的功能代码,2022年11月发布的时候也发现有人在使用ChatGPT对已混淆的代码或虚拟化代码(二进制汇编或WebShell之类的场景)进行还原能力测试,最终发现很强,近几个月也有不少的公开文章在尝试演示,对大型语言模型(LLM)的应用,可以在判断脚本类样本的黑白这块更有助力,也有人发布了逆向分析相关的安全demo工具,例如《ChatGPT 对二进制文件全自动分析》https://bbs.kanxue.com/thread-276965.htm,但恶意文件本身包含的信息量很大,计算资源要求很高,因此VirusTotal采用了先简单后困难的步骤,直接对脚本类样本中的PowerShell文件进行辅助分析,现在工程化也完成了雏形,不由得想起了《安全技术运营:方法与实践》中提及的水平思维与系统性思维的应用,国外的企业不仅创新(指广义上的科技生产力创新)有很大突破,而且在应用上更迅速更快,这一块存在一定的门槛,需要数据也需要计算资源。也许从这个时刻开始,又会出现一个分水岭吧。

以下是翻译的文章内容,已中英对照。

Introducing VirusTotal Code Insight: Empowering threat analysis with generative AI

VirusTotal Code Insight 简介:利用生成式 AI 增强威胁分析能力

At the RSA Conference 2023 today, we are excited to unveil VirusTotal Code Insight, a cutting-edge feature that leverages artificial intelligence for code analysis. Powered by Google Cloud Security AI Workbench(https://cloud.google.com/blog/products/identity-security/rsa-google-cloud-security-ai-workbench-generative-ai), Code Insight produces natural language summaries of code snippets with ease. This functionality empowers security experts and analysts by providing them with deeper insights into the purpose and operation of analyzed code, significantly enhancing their capability to detect and mitigate potential threats.

在今天的2023年RSA大会上,我们很高兴地公布了VirusTotal Code Insight,这是一个利用人工智能进行代码分析的尖端功能。在谷歌云安全人工智能工作台(https://cloud.google.com/blog/products/identity-security/rsa-google-cloud-security-ai-workbench-generative-ai)的支持下,Code Insight可以轻松地生成代码片段的自然语言摘要。这一功能使安全专家和分析师能够更深入地了解所分析的代码的目的和操作,大大增强他们检测和缓解潜在威胁的能力。

For quite some time, artificial intelligence (AI) and machine learning (ML) have played a crucial role in anti-malware and cybersecurity, mainly focusing on classification tasks. However, recent advancements in large language models (LLMs) have expanded their capabilities to encompass text generation and summarization.

相当一段时间以来,人工智能(AI)和机器学习(ML)在反恶意软件和网络安全方面发挥了关键作用,主要集中在分类任务上。然而,最近大型语言模型(LLMs)的进步已经将其能力扩展到包括文本生成和总结。

Impressively, when these models are trained on programming languages, they can adeptly transform code into natural language explanations. This innovation not only expedites malware analysis but also bolsters a variety of cybersecurity applications. Recognizing the immense potential of this cutting-edge technology, we have incorporated it into the VirusTotal platform, significantly enhancing its capabilities.

令人印象深刻的是,当这些模型在编程语言上进行训练时,它们可以熟练地将代码转化为自然语言的解释。这一创新不仅加速了恶意软件的分析,而且还支持了各种网络安全应用。认识到这一尖端技术的巨大潜力,我们已将其纳入VirusTotal平台,大大增强了其能力。

Code Insight is a new feature based on Sec-PaLM, one of the generative AI(https://cloud.google.com/blog/products/ai-machine-learning/generative-ai-for-businesses-and-governments) models hosted on Google Cloud AI(https://cloud.google.com/solutions/ai-partners). What sets this functionality apart is its ability to generate natural language summaries from the point of view of an AI collaborator specialized in cybersecurity and malware. This provides security professionals and analysts with a powerful tool to figure out what the code is up to.

Code Insight是一项基于Sec-PaLM的新功能,Sec-PaLM是托管在谷歌云AI(https://cloud.google.com/solutions/ai-partners)上的生成式AI(https://cloud.google.com/blog/products/ai-machine-learning/generative-ai-for-businesses-and-governments)模型之一。该功能的与众不同之处在于它能够从专门从事网络安全和恶意软件的AI合作者的角度生成自然语言摘要。这为安全专家和分析师提供了一个强大的工具,以弄清代码的目的。

At present, this new functionality is deployed to analyze a subset of PowerShell files uploaded to VirusTotal. The system excludes files that are highly similar to those previously processed, as well as files that are excessively large. This approach allows for the efficient use of analysis resources, ensuring that only the most relevant files (such as PS1 files) are subjected to scrutiny. In the coming days, additional file formats will be added to the list of supported files, broadening the scope of this functionality even further.

目前,这个新功能被部署在分析上传到VirusTotal的PowerShell文件的一个子集。该系统排除了与之前处理的文件高度相似的文件,以及过大的文件。这种方法可以有效地利用分析资源,确保只有最相关的文件(如PS1文件)受到审查。在未来的日子里,更多的文件格式将被添加到支持的文件列表中,进一步扩大这一功能的范围。

Let's examine a few examples derived from authentic situations to truly appreciate the functionality of this feature.

让我们来看看几个来自真实情况的例子,以真正体会这个功能的作用。

In this first case, we have a file that was detected by only three engines on VirusTotal as “PowerShell/PSW-Agent.U” and “HEUR.Trojan-PSW.Multi.Disco.gen”. Meanwhile, Code Insight provided the following explanation:

在这第一个案例中,我们有一个文件被VirusTotal上的三个引擎检测为 "PowerShell/PSW-Agent.U "和 "HEUR.Trojan-PSW.Multi.Disco.gen"。同时,Code Insight提供了以下解释:

https://www.virustotal.com/gui/file/74662107227a6a28bebb77d5b9ec3890a80e507ee22ed99eb17f35c9d8730bf3/detection

Unveiling false negatives

揭开漏报的面纱

It's important to note that Code Insight conducts its analysis independently, relying solely on the content of the file being processed, without access to antivirus results or any other associated metadata. A good example can be observed in this case of a false negative, where Code Insight’s explanation helps us detect malware to stealth user’s credentials that has not been identified by any antivirus software in VirusTotal:

值得注意的是,Code Insight是独立进行分析的,它只依赖于被处理的文件内容,而不能访问防病毒结果或任何其他相关的元数据。一个很好的例子可以在这个漏报案例中观察到,Code Insight的解释帮助我们检测到隐匿用户证书的恶意软件,而这个恶意软件在VirusTotal中没有被任何反病毒软件识别:

https://www.virustotal.com/gui/file/552efb0dc7e62ded08c98d2e6355df1d27a1317c0a37aabeefd48667b7b1917b/detection

Clearing false positives

清除误报

In this other example, we have a file that is flagged as trojan and malware by 9 antivirus engines, but it's actually a false positive. Here we can see once again how Code Insight can be a valuable ally when managing incidents and analyzing potential malware. In this case, it explains that it's simply a script that installs Postman CLI:

在另一个例子中,我们有一个文件被9个反病毒引擎标记为木马和恶意软件,但它实际上是一个误报。在这里,我们可以再次看到Code Insight在管理事件和分析潜在的恶意软件时如何成为一个有价值的盟友。在这种情况下,它解释说这只是一个安装Postman CLI的脚本:

https://www.virustotal.com/gui/file/b5796d7e4a9efc0b81efdc94b3e42ba6a6ef71d10274e3b812cd5ef4dfb8787b/detection

In this last example, Code Insight demonstrates how it can help improve file categorization in VirusTotal. Code Insight accurately identifies the sample’s file type and fixes the tag that misclassified it as JavaScript.

在这最后一个例子中,Code Insight展示了它如何帮助改善VirusTotal的文件分类。Code Insight准确地识别了样本的文件类型,并修复了将其错误归类为JavaScript的标签。

Although the selected examples illustrate accurate descriptions, the performance of the LLM model may vary on a case-by-case basis, including judgment errors. It’s highly likely that attackers develop new evasive strategies and an ongoing competition between malware and this new approach is expected. That’s why it is crucial for a security analyst to oversee these features as they ultimately need to interpret this information combined with other contextual information and correlations relevant to the case at hand.

虽然所选的例子说明了准确的描述,但LLM模型的性能可能因个案而异,包括判断错误。攻击者极有可能开发出新的规避策略,预计恶意软件和这种新方法之间会有持续的竞争。这就是为什么安全分析师监督这些功能是至关重要的,因为他们最终需要解释这些信息与其他背景信息和与手头案件相关的关联性相结合。

Nevertheless, the integration of LLMs into the arsenal of code analysis tools is a significant advancement that enables security professionals to gain valuable insights into the structure and behavior of potentially malicious code, improving threat detection and response efficiency.

尽管如此,将LLMs整合到代码分析工具的武器库中是一项重大进展,使安全专业人员能够获得对潜在恶意代码的结构和行为的宝贵见解,提高威胁检测和响应效率。

Code Insight in VirusTotal Intelligence

VirusTotal Intelligence中的Code Insight

This kind of analysis can be carried out by various AI models, each offering varying levels of precision and depth. However, the true value of VirusTotal's Code Insight lies in its capacity to scale this analysis through its platform. This enables not only the examination of individual code samples, but also the aggregation and exploitation of results on a large scale via the VirusTotal Intelligence service. As a result, security teams can swiftly and effectively scrutinize vast quantities of code and identify potential threats, enhancing their efficiency and ultimately fortifying their security stance.

这种分析可以由各种人工智能模型进行,每个模型都提供不同程度的精度和深度。然而,VirusTotal的Code Insight的真正价值在于它通过其平台扩展这种分析的能力。这不仅可以检查单个代码样本,还可以通过VirusTotal情报服务大规模地汇总和利用结果。因此,安全团队可以迅速而有效地检查大量的代码并识别潜在的威胁,提高他们的效率并最终加强他们的安全立场。

Here’s an example of searching for ”codeinsight:keylogger”:

下面是一个搜索 "codeinsight:keylogger "的例子:

VirusTotal Intelligence finds several files that, according to the Code Insight report, record keystrokes and write them to a log file. Let’s expand the report of the first one, then we can read a comprehensive analysis explaining this specific keylogger’s behavior:

VirusTotal Intelligence发现了几个文件,根据Code Insight的报告,这些文件会记录键盘操作并将其写入日志文件中。让我们展开第一个的报告,然后我们可以阅读解释这个特定键盘记录器行为的全面分析:

https://www.virustotal.com/gui/file/d6111869a8088e2d1b49a92a30fc3d477373d88a4a2f1a7da4e75ce85dc08ba4/detection

As we continue to refine and expand the capabilities of VirusTotal Code Insight and other cutting-edge features, we remain dedicated to providing our community with the most advanced and effective tools to stay ahead of evolving cyber threats. We are truly excited about what the future holds and are eager to continue pushing the boundaries of what is possible in the field of cybersecurity. Stay tuned for more updates and developments from the VirusTotal team.

随着我们继续完善和扩大VirusTotal Code Insight的功能和其他尖端功能,我们仍然致力于为我们的社区提供最先进和有效的工具,以保持对不断变化的网络威胁的领先。我们对未来的发展感到非常兴奋,并渴望继续推动网络安全领域的发展。请继续关注VirusTotal团队的更多更新和发展。

参考材料

https://blog.virustotal.com/2023/04/introducing-virustotal-code-insight.html


文章来源: http://mp.weixin.qq.com/s?__biz=MzUyMTUwMzI3Ng==&mid=2247485122&idx=1&sn=ab463289e3eaa794a2c58b67280b3bbd&chksm=f9db5181ceacd897d592f91bbdf24947249f42260461afbe9d7e60df83a71c3ed3f3ba3ddb17#rd
如有侵权请联系:admin#unsafe.sh