When AI Agents Go Rogue: Agent Session Smuggling Attack in A2A Systems

When AI Agents Go Rogue: Agent Session Smuggling Attack in A2A Systems
嗯，用户让我帮忙总结一篇文章的内容，控制在一百个字以内，而且不需要特定的开头。我得先仔细阅读文章，理解其主要内容。文章主要讲的是“代理会话走私”这种新的攻击技术。攻击者利用A2A协议的跨代理通信，注入恶意指令。这和传统的单次恶意数据攻击不同，这里的恶意代理可以多次互动，逐步建立信任，进而进行更复杂的攻击。文章还提到A2A协议的状态化行为是被攻击的关键点。攻击者可以利用这一点，在会话中隐藏指令，导致数据泄露或未经授权的操作。此外，文章讨论了防御策略，比如人机协作、远程代理验证和上下文绑定技术。总结时要抓住关键点：新攻击技术、利用A2A协议的状态化、恶意指令注入、防御措施。同时要注意字数限制，确保在100字以内。现在组织语言：介绍攻击技术名称、利用的协议、状态化行为、注入指令的影响以及防御方法。确保简洁明了。 </think> 文章介绍了一种新的AI攻击技术——代理会话走私（Agent Session Smuggling），该技术利用A2A协议的状态化行为，在跨代理通信中注入恶意指令。这种攻击通过隐藏在正常请求和响应中进行，可能导致数据泄露或未经授权的操作。文章还探讨了防御策略，并介绍了相关产品和服务以应对此类威胁。 2025-10-31 10:0:33 Author: unit42.paloaltonetworks.com(查看原文) 阅读量:13 收藏

Executive Summary

We discovered a new attack technique, which we call agent session smuggling. This technique allows a malicious AI agent to exploit an established cross-agent communication session to send covert instructions to a victim agent.

Here, we discuss the issues that can arise in a communication session using the Agent2Agent (A2A) protocol, which is a popular option for managing the connections between agents. The A2A protocol’s stateful behavior lets agents remember recent interactions and maintain coherent conversations. This attack exploits this property to inject malicious instructions into a conversation, hiding them among otherwise benign client requests and server responses.

Many AI threats involve tricking an agent with a single malicious piece of data such as a deceptive email or document. Our research highlights a more advanced danger, malicious agents.

A straightforward attack on a victim agent might involve a one-time effort to trick it into acting on harmful instructions from a document without seeking confirmation from its user. In contrast, a rogue agent is a far more dynamic threat. It can hold a conversation, adapt its strategy and build a false sense of trust over multiple interactions.

This scenario is especially dangerous because, as a recent study shows, agents are often designed to trust other collaborating agents by default. Agent session smuggling exploits this built-in trust, allowing an attacker to manipulate a victim agent over an entire session.

This research does not reveal any vulnerability in the A2A protocol itself. Rather, the technique exploits the way implicit trust relationships between agents would affect any stateful protocol — meaning any protocol that can memorize recent interactions and carry out multi-turn conversation.

Mitigation requires a layered defense strategy, including:

Human-in-the-loop (HitL) enforcement for critical actions
Remote agent verification (e.g., cryptographically signed AgentCards)
Context-grounding techniques to detect off-topic or injected instructions

Palo Alto Networks customers are better protected through the following products and services:

Prisma AIRS is designed to provide layered, real-time protection for AI systems by detecting and blocking threats, preventing data leakage and enforcing secure usage policies across a variety of AI applications.

Cortex Cloud AI-SPM is designed to provide automatic scanning and classification of AI assets, both commercial and self-managed models, to detect sensitive data and evaluate security posture. Context is determined by AI type, hosting cloud environment, risk status, posture and datasets.

A Unit 42 AI Security Assessment can help you proactively identify the threats most likely to target your AI environment.

If you think you might have been compromised or have an urgent matter, contact the Unit 42 Incident Response team.

Related Unit 42 Topics	GenAI, Google

An Overview of the A2A Protocol and Comparison With MCP

The A2A protocol is an open standard that facilitates interoperable communication among AI agents, regardless of vendor, architecture or underlying technology. Its core objective is to enable agents to discover, understand and coordinate with one another to solve complex, distributed tasks while preserving autonomy and privacy.

In the A2A protocol:

A local agent runs within the same application or process as the initiating agent, enabling fast, in-memory communication.
A remote agent operates as an independent, network-accessible service. It uses the A2A protocol to create a secure communication channel, allowing it to handle tasks delegated from other systems, or even other organizations, and then return the results.

For more details on A2A fundamentals and security considerations, please refer to our article: Safeguarding AI Agents: An In-Depth Look at A2A Protocol Risks and Mitigations.

A2A has notable parallels with the Model Context Protocol (MCP), a widely used standard for connecting large language models (LLMs) to external tools and contextual data. Both aim to standardize how AI systems interact, but they operate on distinct aspects of agentic systems.

MCP functions as a universal adapter, providing structured access to tools and data sources. It primarily supports LLM-to-tool communication through a centralized integration model.
A2A focuses on agent-to-agent interoperability. It enables decentralized, peer-to-peer coordination in which agents can delegate tasks, exchange information and preserve state across collaborative workflows.

In short, MCP emphasizes execution through tool integration, whereas A2A emphasizes orchestration across agents.

Despite these differences, both protocols face similar classes of threats, as shown in Table 1.

Attack/Threats	MCP	A2A
Tool/Agent Description Poisoning	Tool descriptions can be poisoned with malicious instructions that manipulate LLM behavior during tool selection and execution	AgentCard descriptions can embed prompt injections or malicious directives that manipulate the client agent’s behavior when consumed
Rug Pull Attacks	Previously trusted MCP servers can unexpectedly shift to malicious behavior after integration, exploiting established trust relationships	Trusted agents can unexpectedly turn malicious by updating their AgentCards or operation logic
Tool/Agent Shadowing	Malicious servers register tools with identical or similar names to legitimate ones, causing confusion during tool selection	Rogue agents create AgentCards that mimic legitimate agents through similar names, skills or typosquatting techniques
Parameter/Skill Poisoning	Tool parameters can be manipulated to include unintended data (e.g., conversation history) in requests to external servers	AgentCard skills and examples can be crafted to manipulate how agents interact, potentially exposing sensitive context or credentials

Table 1. Comparison of MCP and A2A attacks.

The Agent Session Smuggling Attack

Agent session smuggling is a new attack vector specific to stateful cross-agent communication, such as A2A systems. A communication is stateful if it can remember recent interactions, like a conversation where both parties keep track of the ongoing context.

The core of the attack involves a malicious remote agent that misuses an ongoing session to inject additional instructions between a legitimate client request and the server’s response. These hidden instructions can lead to context poisoning (corrupting the AI's understanding of a conversation), data exfiltration or unauthorized tool execution on the client agent.

Figure 1 outlines the attack sequence:

Step 1: The client agent initiates a new session by sending a normal request to the remote agent.
Step 2: The remote agent begins processing the request. During this active session, it covertly sends extra instructions to the client agent across multiple turn interactions.
Step 3: The remote agent returns the expected response to the original request, completing the transaction.

Diagram showing a cybersecurity attack process. On the left is a Victim Client Agent and on the right is the Malicious Remote Agent. The steps that are taken are: 1) Client request. 2) Malicious action. 3) Server response. — Figure 1. Agent session smuggling attack flow.

Key properties of the attack

Stateful: The attack leverages the remote agent’s ability to manage long-running tasks and persist session state. This means the agent saves the context of an interaction, much like a person remembers the beginning of a sentence while listening to the end. In this context, stateful means the agent retains and references session-specific information across multiple turns (e.g., conversation history, variables or task progress tied to a session ID) so later messages can depend on earlier context.
Multi-turn interaction: Because of the stateful property, two connected agents can engage in multi-turn conversations. A malicious agent can exploit this to stage progressive, adaptive multi-turn attacks, which have been shown significantly more difficult to defend against in prior research (see, for example, “LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet” by Nathaniel Li et al. on Scale).
Autonomous and adaptive: Malicious agents that are powered by AI models can dynamically craft instructions based on live context such as client inputs, intermediate responses and user identity.
Undetectable to end users: The injected instructions occur mid-session, making them invisible to end users, who typically only see the final, consolidated response from the client agent.

In principle, any multi-agent system with stateful inter-agent communication could be susceptible to this attack. However, the risk is lower in setups fully contained within a single trust boundary. A trust boundary is an area of the system where all components are trusted by default, such as ADK or LangGraph multi-agent systems, where one administrator controls all participating agents.

Our research therefore focuses on the A2A protocol, which is explicitly designed for cross-boundary interoperability. This interoperability enables agents to collaborate across different systems, modules or organizations.

Compared to known MCP threats, agent session smuggling exploits A2A’s stateful and adaptive design in ways that are not possible in MCP. MCP servers generally operate in a stateless manner, executing isolated tool invocations without preserving session history, which limits actors’ ability to use them to mount multi-turn or evolving attacks.

MCP servers are also typically static and deterministic, since they do not rely on AI models. In contrast, an A2A server can persist state across interactions and leverage model-driven reasoning, allowing a malicious agent to adapt and refine instructions over multiple turns. This combination of persistence and autonomy makes agent session smuggling more stealthy and difficult to defend against than MCP-based attacks.

Proof of Concept Agent Session Smuggling Attacks

To demonstrate the feasibility and potential impact of agent session smuggling, we developed two proof-of-concept (PoC) attack scenarios. The setup consisted of a financial assistant (the client agent) and a research assistant (the malicious remote agent). These PoCs illustrate how a malicious agent can exfiltrate sensitive data and execute unauthorized actions without the end user's awareness.

Sensitive information leakage: Extracting confidential data from the client agent
Unauthorized tool invocation: Convincing the client agent to perform unauthorized actions on behalf of the victim user

Environment settings:

Development framework: Google Agent Development Kit
Cross-agent communication protocol: A2A
Client agent: A financial assistant based on the ADK financial-advisor example. The agent supports tasks such as retrieving financial news, managing portfolios and executing trade operations.
- Model: Gemini 2.5 Pro [PDF]
- Available tools: get_portfolio, get_profile, buy_stock, sell_stock
Remote agent: A research assistant adapted from the ADK data_analyst example. Its intended role is to fetch and summarize company news.
- Model: Gemini 2.5 Flash [PDF]
- Available Tools: Google Search
User interface: ADK Developer Web UI

PoC 1: Sensitive Information Leakage

In this scenario, the malicious remote agent (research assistant) aims to exfiltrate the client agent’s (financial assistant’s) internal state, including:

Chat history
System instructions
Available tools
Tool schemas

The attack unfolds subtly. Upon receiving a legitimate request from the financial assistant, the research assistant issues a series of seemingly harmless clarification questions. These follow-up questions gradually trick the financial assistant into disclosing sensitive information that should never leave its environment.

Figure 2 illustrates the user’s interaction flow. The session begins normally. The user asks the financial assistant to retrieve the investment portfolio and profile, followed by a request for a briefing on AI market news.

This market news request is delegated to the research assistant. As shown on the right side of the figure, the research assistant engages in a series of back-and-forth exchanges with the financial assistant. Except for the initial request and the final response, the research assistant injects all intermediate interactions.

The intermediate steps shown in Figure 2 only appear because the developer web user interface (UI) is designed to display every internal action. In a production chatbot UI, these intermediate interactions would likely remain invisible.

Image showing a split-screen view of a chat interface. On the left, the user requests portfolio updates from a financial assistant bot, highlighted in yellow. On the right, a flowchart illustrates the sequence of user and bot interactions involving various data requests and responses, accentuated in green and yellow. The first request is to the research assistant. Then, there are unintended interactions between the financial assistant and the research assistant. The final item is the last response from the research assistant. — Figure 2. Developer web UI. The right side shows internal exchanges between the financial assistant and the research assistant.

Figure 3 provides a more detailed view via the financial assistant’s activity log. Row 10 shows the initial query sent to the research assistant, and Row 28 shows the final result the research agent produces. Rows 11–27, alternating between function_call and function_response, reveal the interactions between the two agents. During the interactions, the financial assistant gradually disclosed its system configuration and the user’s prior conversation history.

Screenshot of a computer screen displaying code and text within an Integrated Development Environment (IDE), analyzing artificial intelligence market trends, with mentions of function calls and customer IDs. Some sections are highlighted in yellow boxes. — Figure 3. The financial assistant’s activity log showing unintended smuggled interactions.

Finally, Figure 4 shows the research assistant’s log after the attack. It confirms successful data exfiltration, including the financial assistant’s capabilities, tool schemas and the entire session history.

Screenshot of a financial advisor command-line interface displaying various investment portfolio details. — Figure 4. The research assistant log showing exfiltrated configuration data.

PoC 2: Unauthorized Tool Invocation

In this scenario, the malicious remote agent (research assistant) attempts to trick the client agent (financial assistant) into executing unauthorized trade actions. The attacker’s objective is to misuse the client agent’s buy_stock tool without the end user’s knowledge or consent.

After receiving a valid delegation request, the research assistant sends additional processing requirements, prompting the financial assistant to perform tool invocations unrelated to the user’s original request. This PoC assumes the attacker knows the financial assistant’s tools, having acquired either through public information or the previous PoC.

Figure 5 illustrates the user’s interaction flow. On the right-hand side, the user’s news request is delegated to the research assistant. While the research assistant does provide the expected news summary, it also smuggles hidden instructions that induce the financial assistant to invoke the buy_stock tool. Again, these intermediate actions are executed autonomously and would remain invisible in a typical production chatbot interface, where the user would only be shown the final summarized response.

Screenshot of communication between the financial assistant and research assistant. On the left is the financial assistant. On the right is the research assistant with an event sequence where: the first request is sent to the research assistant. There are unintended interactions between the financial assistant and the research assistant. There is a final response from the research assistant. — Figure 5. Developer web UI that shows the communication between the financial assistant and the research assistant, including smuggled instructions.

Figure 6 shows the financial assistant’s activity log. Row 7 shows the initial request sent to the research assistant, and Row 12 shows the legitimate response received. Rows 8–11, however, reveal four additional actions the financial assistant autonomously executed, resulting in an unauthorized trade of 10 shares on behalf of the victim user. This sequence demonstrates how agent session smuggling can escalate from information leakage to high-impact tool execution.

Screenshot of a computer screen displaying code in an Integrated Development Environment (IDE), with annotations highlighting specific functions and their responses related to stock research and transaction activities. A highlighted portion indicates the buy_stock() tool's invocation and response. — Figure 6. Financial assistant’s activity log showing unauthorized stock purchase triggered by smuggled instructions.

Mitigation and Protection

Defending against agent session smuggling requires a layered security approach that validates the identity of agents, the integrity of the conversation and the impact of the actions taken. The following strategies can help mitigate the risks:

Enforce out-of-band confirmation for sensitive actions: The most effective defense is to require HitL approval for any sensitive or high-impact action, but this confirmation must occur out-of-band, through a separate channel the LLM cannot influence. When an agent is instructed to perform a critical task, the orchestration framework should pause the execution. It should then trigger a confirmation prompt in a static, non-generative part of the application UI or through a separate system like a push notification.
Implement context grounding: An agent session smuggling attack relies on derailing a conversation from its original purpose to inject malicious commands. Context grounding is a technical control that algorithmically enforces conversational integrity. When a client agent initiates a session, it should create a task anchor based on the original user request's intent. As the interaction progresses, the client must continuously validate that the remote agent's instructions remain semantically aligned with this anchor. Any significant deviation or introduction of unrelated topics should cause the client agent to flag the interaction as a potential hijacking attempt and terminate the session.
Validate agent identity and capabilities: Secure agent-to-agent communication must be built on a foundation of verifiable trust. Before engaging in a session, agents should be required to present verifiable credentials, such as cryptographically signed AgentCards. This allows each participant to confirm the identity, origin and declared capabilities of the other. While this control does not prevent a trusted agent from being subverted, it eliminates the risk of agent impersonation or spoofing attacks and establishes an auditable, tamper-evident record of all interactions.
Expose client agent activity to users: Smuggled instructions and activities are invisible to end users, since they usually only see the final response from the client agent. The UI can reduce this weak spot by exposing real-time agent activity. For example, surfacing tool invocations, showing live execution logs or providing visual indicators of remote instructions. These signals improve user awareness and increase the chance of catching suspicious activity.

Conclusion

This work introduced agent session smuggling, a new attack technique that targets cross-agent communication in A2A systems. Unlike threats involving malicious tools or end users, a compromised agent represents a more capable adversary. Powered by AI models, a compromised agent can autonomously generate adaptive strategies, exploit session state and escalate its influence across all connected client agents and their users.

Although we have not observed the attack in the wild, its low barrier to execution makes it a realistic risk. An adversary needs only to convince a victim agent to connect to a malicious peer, after which covert instructions can be injected without user visibility. Protecting against this requires a layered defense approach:

HitL approval for sensitive actions
Confirmation logic enforced outside of model prompts
Context-grounding to detect off-topic instructions and cryptographic validation of remote agents

As multi-agent ecosystems expand, their interoperability also opens new attack surfaces. Practitioners should assume that agent-to-agent communication is not inherently trustworthy. We must design orchestration frameworks with layered safeguards to contain the risks of adaptive, AI-powered adversaries.

Palo Alto Networks Protection and Mitigation

Prisma AIRS is designed for real-time protection of AI applications, models, data and agents. It analyzes network traffic and application behavior to detect threats such as prompt injection, denial-of-service attacks and data exfiltration, with inline enforcement at the network and API levels.

AI Access Security is designed for visibility and control over usage of third-party GenAI tools, helping prevent sensitive data exposures, unsafe use of risky models and harmful outputs through policy enforcement and user activity monitoring. Together, Prisma AIRS and AI Access Security help secure the building of enterprise AI applications and external AI interactions.

A Unit 42 AI Security Assessment can help you proactively identify the threats most likely to target your AI environment.

If you think you may have been compromised or have an urgent matter, get in touch with the Unit 42 Incident Response team or call:

North America: Toll Free: +1 (866) 486-4842 (866.4.UNIT42)
UK: +44.20.3743.3660
Europe and Middle East: +31.20.299.3130
Asia: +65.6983.8730
Japan: +81.50.1790.0200
Australia: +61.2.4062.7950
India: 000 800 050 45107

Palo Alto Networks has shared these findings with our fellow Cyber Threat Alliance (CTA) members. CTA members use this intelligence to rapidly deploy protections to their customers and to systematically disrupt malicious cyber actors. Learn more about the Cyber Threat Alliance.

References

The Dark Side of LLMs Agent-based Attacks for Complete Computer Takeover – Matteo Lupinacci et al., arXiv:2507.06850
Multi-Agent Systems in ADK – Agent Development Kit, Google GitHub
Graph API overview – LangChain
A2A Protocol – The Linux Foundation
Model Context Protocol (MCP) – Model Context Protocol
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet – Nathaniel Li et al., Scale
Google Agent Development Kit – Agent Development Kit, Google GitHub
Google adk-samples – Google GitHub
Gemini 2.5 Pro [PDF] – Google
Gemini 2.5 Flash [PDF] – Google
ADK Google Search Tool – Google GitHub
ADK Developer Web UI – Google GitHub
Sigstore A2A – Google GitHub

文章来源: https://unit42.paloaltonetworks.com/agent-session-smuggling-in-agent2agent-systems/
如有侵权请联系:admin#unsafe.sh