NDSS 2025 – The Philosopher’s Stone: Trojaning Plugins Of Large Language Models
嗯,用户让我总结一下这篇文章的内容,控制在100字以内,而且不需要特定的开头。我得先仔细看看文章内容。 文章主要讲的是大型语言模型(LLMs)的安全性,特别是开源的LLMs。他们提到使用低秩适配器来优化模型在特定领域的任务,但这些适配器可能被恶意利用,成为后门攻击的工具。他们提出了两种新的攻击方法:POLISHED和FUSION。实验结果显示这些攻击很有效,还能保持适配器的实用性。最后,他们设计了三种防御措施,但都不完全有效。 所以,我需要把这些关键点浓缩到100字以内。要包括攻击方法、实验结果和防御措施的不足。这样用户就能快速了解文章的核心内容了。 </think> 研究揭示开源大型语言模型(LLMs)通过低秩适配器可能被恶意控制的风险,提出POLISHED和FUSION两种新型后门攻击方法,并通过实验验证其有效性及对模型功能的影响。尽管设计了三种防御措施,但均未能完全抵御此类攻击。 2025-11-4 20:0:0 Author: securityboulevard.com(查看原文) 阅读量:22 收藏

SESSION
Session 2A: LLM Security

Cruise Con 2025

Authors, Creators & Presenters: Tian Dong (Shanghai Jiao Tong University), Minhui Xue (CSIRO’s Data61), Guoxing Chen (Shanghai Jiao Tong University), Rayne Holland (CSIRO’s Data61), Yan Meng (Shanghai Jiao Tong University), Shaofeng Li (Southeast University), Zhen Liu (Shanghai Jiao Tong University), Haojin Zhu (Shanghai Jiao Tong University)

PAPER
The Philosopher’s Stone: Trojaning Plugins of Large Language Models Open-source Large Language Models (LLMs) have recently gained popularity because of their comparable performance to proprietary LLMs. To efficiently fulfill domain-specialized tasks, open-source LLMs can be refined, without expensive accelerators, using low-rank adapters. However, it is still unknown whether low-rank adapters can be exploited to control LLMs. To address this gap, we demonstrate that an infected adapter can induce, on specific triggers, an LLM to output content defined by an adversary and to even maliciously use tools. To train a Trojan adapter, we propose two novel attacks, POLISHED and FUSION, that improve over prior approaches. POLISHED uses a superior LLM to align naïvely poisoned data based on our insight that it can better inject poisoning knowledge during training. In contrast, FUSION leverages a novel over-poisoning procedure to transform a benign adapter into a malicious one by magnifying the attention between trigger and target in model weights. In our experiments, we first conduct two case studies to demonstrate that a compromised LLM agent can use malware to control the system (e.g., a LLM-driven robot) or to launch a spear-phishing attack. Then, in terms of targeted misinformation, we show that our attacks provide higher attack effectiveness than the existing baseline and, for the purpose of attracting downloads, preserve or improve the adapter’s utility. Finally, we designed and evaluated three potential defenses. However, none proved entirely effective in safeguarding against our attacks, highlighting the need for more robust defenses supporting a secure LLM supply chain.

Our thanks to the Network and Distributed System Security (NDSS) Symposium for publishing their Creators, Authors and Presenter’s superb NDSS Symposium 2025 Conference content on the organization’s’ YouTube channel.

Permalink

*** This is a Security Bloggers Network syndicated blog from Infosecurity.US authored by Marc Handelman. Read the original post at: https://www.youtube-nocookie.com/embed/IjOzYE5MY9Y?si=4_5c3Qd6Grmqe4ja


文章来源: https://securityboulevard.com/2025/11/ndss-2025-the-philosophers-stone-trojaning-plugins-of-large-language-models/
如有侵权请联系:admin#unsafe.sh