FuzzLLM：一种新型通用的模糊测试框架，用于主动发现大语言模型中的越狱漏洞

FuzzLLM：一种新型通用的模糊测试框架，用于主动发现大语言模型中的越狱漏洞
嗯，用户让我总结一下这篇文章的内容，控制在一百个字以内，而且不需要用“文章内容总结”这样的开头。首先，我得仔细阅读摘要，了解文章的主要内容。摘要里提到的是大型语言模型（LLMs）的越狱漏洞，这些漏洞利用精心设计的提示来获取违反服务指南的内容。研究者们开发了一个叫做FuzzLLM的自动化模糊测试框架，用来主动检测这些漏洞。他们用模板来捕捉提示的结构，并隔离关键特征作为约束条件。通过组合不同的基类和调整约束元素，FuzzLLM可以高效地进行测试，减少手动工作量。实验结果显示它在多个LLMs上有效。好的，现在需要把这些信息浓缩到100字以内。要抓住关键点：FuzzLLM的作用、方法、优势和效果。可能的结构是：介绍问题（越狱漏洞），提出解决方案（FuzzLLM），说明方法（模板、组合攻击），强调优势（高效、减少手动工作），最后提到实验结果的有效性。再检查一下字数，确保不超过限制。可能需要删减一些细节，比如“精心设计的提示”可以简化为“特定提示”，或者直接说“利用特定提示”。最终总结应该清晰明了，涵盖主要贡献和方法。 </think> 文章介绍了一种名为FuzzLLM的自动化模糊测试框架，用于检测大型语言模型中的越狱漏洞。通过模板和组合攻击技术，该框架能够高效发现潜在风险，并减少人工干预。实验表明其在多种模型中表现优异。 2025-12-19 06:59:10 Author: arxiv.org(查看原文) 阅读量:3 收藏

View PDF HTML (experimental)

Abstract:Jailbreak vulnerabilities in Large Language Models (LLMs), which exploit meticulously crafted prompts to elicit content that violates service guidelines, have captured the attention of research communities. While model owners can defend against individual jailbreak prompts through safety training strategies, this relatively passive approach struggles to handle the broader category of similar jailbreaks. To tackle this issue, we introduce FuzzLLM, an automated fuzzing framework designed to proactively test and discover jailbreak vulnerabilities in LLMs. We utilize templates to capture the structural integrity of a prompt and isolate key features of a jailbreak class as constraints. By integrating different base classes into powerful combo attacks and varying the elements of constraints and prohibited questions, FuzzLLM enables efficient testing with reduced manual effort. Extensive experiments demonstrate FuzzLLM's effectiveness and comprehensiveness in vulnerability discovery across various LLMs.

Submission history

From: Dongyu Yao [view email]
[v1] Mon, 11 Sep 2023 07:15:02 UTC (1,008 KB)
[v2] Sun, 14 Apr 2024 15:39:41 UTC (3,813 KB)

文章来源: https://arxiv.org/abs/2309.05274
如有侵权请联系:admin#unsafe.sh