The Promptware Kill Chain
好的,我现在需要帮用户总结这篇文章的内容,控制在100字以内,而且不需要特定的开头。首先,我得仔细阅读文章,抓住主要观点。 文章主要讨论了针对现代生成式AI大语言模型(LLMs)的攻击,称为“promptware”。作者提出了一个七步的“promptware kill chain”,详细描述了攻击的不同阶段,从初始访问到最终目标行动。这些阶段包括初始访问、权限提升、侦察、持久化、命令控制、横向移动和目标行动。 此外,文章指出传统的防御方法如修补漏洞不够,需要更全面的防御策略来应对这些复杂的攻击。这可能包括限制权限提升、约束侦察活动、防止持久化等措施。 总结时,我需要涵盖攻击的分类为promptware,提出的kill chain框架以及防御建议。同时要保持简洁,控制在100字以内。 可能会遇到的问题是如何在有限的字数内准确传达所有关键点。可能需要简化某些步骤或合并信息,确保核心内容不被遗漏。 最终总结应该清晰明了,涵盖攻击分类、框架结构和防御建议。 </think> 文章讨论了针对现代生成式AI大语言模型(LLMs)的新型恶意软件攻击——“promptware”,并提出了一种七步“kill chain”框架来分析和防御这些攻击。该框架包括初始访问、权限提升、侦察、持久化、命令控制、横向移动和目标行动等阶段,并强调需要采取系统性防御策略来应对复杂的AI威胁。 2026-2-16 12:4:1 Author: www.schneier.com(查看原文) 阅读量:1 收藏

The promptware kill chain: initial access, privilege escalation, reconnaissance, persistence, command & control, lateral movement, action on objective

Attacks against modern generative artificial intelligence (AI) large language models (LLMs) pose a real threat. Yet discussions around these attacks and their potential defenses are dangerously myopic. The dominant narrative focuses on “prompt injection,” a set of techniques to embed instructions into inputs to LLM intended to perform malicious activity. This term suggests a simple, singular vulnerability. This framing obscures a more complex and dangerous reality. Attacks on LLM-based systems have evolved into a distinct class of malware execution mechanisms, which we term “promptware.” In a new paper, we, the authors, propose a structured seven-step “promptware kill chain” to provide policymakers and security practitioners with the necessary vocabulary and framework to address the escalating AI threat landscape.

In our model, the promptware kill chain begins with Initial Access. This is where the malicious payload enters the AI system. This can happen directly, where an attacker types a malicious prompt into the LLM application, or, far more insidiously, through “indirect prompt injection.” In the indirect attack, the adversary embeds malicious instructions in content that the LLM retrieves (obtains in inference time), such as a web page, an email, or a shared document. As LLMs become multimodal (capable of processing various input types beyond text), this vector expands even further; malicious instructions can now be hidden inside an image or audio file, waiting to be processed by a vision-language model.

The fundamental issue lies in the architecture of LLMs themselves. Unlike traditional computing systems that strictly separate executable code from user data, LLMs process all input—whether it is a system command, a user’s email, or a retrieved document—as a single, undifferentiated sequence of tokens. There is no architectural boundary to enforce a distinction between trusted instructions and untrusted data. Consequently, a malicious instruction embedded in a seemingly harmless document is processed with the same authority as a system command.

But prompt injection is only the Initial Access step in a sophisticated, multistage operation that mirrors traditional malware campaigns such as Stuxnet or NotPetya.

Once the malicious instructions are inside material incorporated into the AI’s learning, the attack transitions to Privilege Escalation, often referred to as “jailbreaking.” In this phase, the attacker circumvents the safety training and policy guardrails that vendors such as OpenAI or Google have built into their models. Through techniques analogous to social engineering—convincing the model to adopt a persona that ignores rules—to sophisticated adversarial suffixes in the prompt or data, the promptware tricks the model into performing actions it would normally refuse. This is akin to an attacker escalating from a standard user account to administrator privileges in a traditional cyberattack; it unlocks the full capability of the underlying model for malicious use.

Following privilege escalation comes Reconnaissance. Here, the attack manipulates the LLM to reveal information about its assets, connected services, and capabilities. This allows the attack to advance autonomously down the kill chain without alerting the victim. Unlike reconnaissance in classical malware, which is performed typically before the initial access, promptware reconnaissance occurs after the initial access and jailbreaking components have already succeeded. Its effectiveness relies entirely on the victim model’s ability to reason over its context, and inadvertently turns that reasoning to the attacker’s advantage.

Fourth: the Persistence phase. A transient attack that disappears after one interaction with the LLM application is a nuisance; a persistent one compromises the LLM application for good. Through a variety of mechanisms, promptware embeds itself into the long-term memory of an AI agent or poisons the databases the agent relies on. For instance, a worm could infect a user’s email archive so that every time the AI summarizes past emails, the malicious code is re-executed.

The Command-and-Control (C2) stage relies on the established persistence and dynamic fetching of commands by the LLM application in inference time from the internet. While not strictly required to advance the kill chain, this stage enables the promptware to evolve from a static threat with fixed goals and scheme determined at injection time into a controllable trojan whose behavior can be modified by an attacker.

The sixth stage, Lateral Movement, is where the attack spreads from the initial victim to other users, devices, or systems. In the rush to give AI agents access to our emails, calendars, and enterprise platforms, we create highways for malware propagation. In a “self-replicating” attack, an infected email assistant is tricked into forwarding the malicious payload to all contacts, spreading the infection like a computer virus. In other cases, an attack might pivot from a calendar invite to controlling smart home devices or exfiltrating data from a connected web browser. The interconnectedness that makes these agents useful is precisely what makes them vulnerable to a cascading failure.

Finally, the kill chain concludes with Actions on Objective. The goal of promptware is not just to make a chatbot say something offensive; it is often to achieve tangible malicious outcomes through data exfiltration, financial fraud, or even physical world impact. There are examples of AI agents being manipulated into selling cars for a single dollar or transferring cryptocurrency to an attacker’s wallet. Most alarmingly, agents with coding capabilities can be tricked into executing arbitrary code, granting the attacker total control over the AI’s underlying system. The outcome of this stage determines the type of malware executed by promptware, including infostealer, spyware, and cryptostealer, among others.

The kill chain was already demonstrated. For example, in the research “Invitation Is All You Need,” attackers achieved initial access by embedding a malicious prompt in the title of a Google Calendar invitation. The prompt then leveraged an advanced technique known as delayed tool invocation to coerce the LLM into executing the injected instructions. Because the prompt was embedded in a Google Calendar artifact, it persisted in the long-term memory of the user’s workspace. Lateral movement occurred when the prompt instructed the Google Assistant to launch the Zoom application, and the final objective involved covertly livestreaming video of the unsuspecting user who had merely asked about their upcoming meetings. C2 and reconnaissance weren’t demonstrated in this attack.

Similarly, the “Here Comes the AI Worm” research demonstrated another end-to-end realization of the kill chain. In this case, initial access was achieved via a prompt injected into an email sent to the victim. The prompt employed a role-playing technique to compel the LLM to follow the attacker’s instructions. Since the prompt was embedded in an email, it likewise persisted in the long-term memory of the user’s workspace. The injected prompt instructed the LLM to replicate itself and exfiltrate sensitive user data, leading to off-device lateral movement when the email assistant was later asked to draft new emails. These emails, containing sensitive information, were subsequently sent by the user to additional recipients, resulting in the infection of new clients and a sublinear propagation of the attack. C2 and reconnaissance weren’t demonstrated in this attack.

The promptware kill chain gives us a framework for understanding these and similar attacks; the paper characterizes dozens of them. Prompt injection isn’t something we can fix in current LLM technology. Instead, we need an in-depth defensive strategy that assumes initial access will occur and focuses on breaking the chain at subsequent steps, including by limiting privilege escalation, constraining reconnaissance, preventing persistence, disrupting C2, and restricting the actions an agent is permitted to take. By understanding promptware as a complex, multistage malware campaign, we can shift from reactive patching to systematic risk management, securing the critical systems we are so eager to build.

This essay was written with Oleg Brodt, Elad Feldman and Ben Nassi, and originally appeared in Lawfare.

Tags: , ,

Posted on February 16, 2026 at 7:04 AM0 Comments

Sidebar photo of Bruce Schneier by Joe MacInnis.


文章来源: https://www.schneier.com/blog/archives/2026/02/the-promptware-kill-chain.html
如有侵权请联系:admin#unsafe.sh