NDSS 2025 – A Comparative Evaluation Of Large Language Models In Vulnerability Detection
好的,我现在需要帮用户总结一篇关于大型语言模型在漏洞检测中的文章。用户要求控制在100字以内,不需要特定的开头,直接写描述。
首先,我需要通读文章内容。文章主要讨论了LLMs在漏洞检测中的应用,评估了多个模型如LLaMA-2、CodeLLaMA等在Java和C/C++中的表现。研究涉及零样本和少样本设置,分析了模型性能、参数影响等因素。
接下来,我需要提取关键点:评估对象是多个先进LLM;任务包括检测漏洞类型;结果发现Gemma和LLaMA-2表现良好,但存在波动;模型参数如上下文窗口影响性能。
然后,我要将这些信息浓缩成100字以内。要确保涵盖主要模型、任务、结果和影响因素。
最后,检查语言是否简洁明了,符合用户的要求。
</think>
该研究评估了多种大型语言模型(如LLaMA-2、CodeLLaMA等)在漏洞检测中的表现,主要针对Java代码,并测试了C/C++的泛化能力。研究发现Gemma和LLaMA-2在检测特定漏洞类型方面表现优异,但性能因配置和编程语言而异。上下文窗口大小对性能提升显著,而量化方法效果有限。
2026-3-3 20:0:0
Author: securityboulevard.com(查看原文)
阅读量:2
收藏
Session 14C: Vulnerability Detection
Authors, Creators & Presenters: Jie Lin (University of Central Florida), David Mohaisen (University of Central Florida)
PAPER
From Large to Mammoth: A Comparative Evaluation of Large Language Models in Vulnerability Detection
Large Language Models (LLMs) have demonstrated strong potential in tasks such as code understanding and generation. This study evaluates several advanced LLMs–such as LLaMA-2, CodeLLaMA, LLaMA-3, Mistral, Mixtral, Gemma, CodeGemma, Phi-2, Phi-3, and GPT-4–for vulnerability detection, primarily in Java, with additional tests in C/C++ to assess generalization. We transition from basic positive sample detection to a more challenging task involving both positive and negative samples and evaluate the LLMs’ ability to identify specific vulnerability types. Performance is analyzed using runtime and detection accuracy in zero-shot and few-shot settings with custom and generic metrics. Key insights include the strong performance of models like Gemma and LLaMA-2 in identifying vulnerabilities, though this success varies, with some configurations performing no better than random guessing. Performance also fluctuates significantly across programming languages and learning modes (zero- vs. few-shot). We further investigate the impact of model parameters, quantization methods, context window (CW) sizes, and architectural choices on vulnerability detection. While CW consistently enhances performance, benefits from other parameters, such as quantization, are more limited. Overall, our findings underscore the potential of LLMs in automated vulnerability detection, the complex interplay of model parameters, and the current limitations in varied scenarios and configurations.
ABOUT NDSS
The Network and Distributed System Security Symposium (NDSS) fosters information exchange among researchers and practitioners of network and distributed system security. The target audience includes those interested in practical aspects of network and distributed system security, with a focus on actual system design and implementation. A major goal is to encourage and enable the Internet community to apply, deploy, and advance the state of available security technologies.