Securing AI Agents with Information Flow Control (Part II)
好的,我现在需要帮用户总结这篇文章的内容,控制在100个字以内。首先,我得仔细阅读文章,理解它的主要观点和结构。 文章是关于微软研究院的一篇论文,讨论如何通过信息流控制(IFC)来保障AI代理的安全。文章分为三个部分,现在是第二部分,主要聚焦于规划器(planner)的作用。 在第一部分中,作者提出了一个关键问题:当AI代理被赋予系统权限时会发生什么?他们讨论了工具调用代理可能被劫持的问题,并提出了IFC作为解决数据泄露的潜在方法。然而,他们意识到需要更深入地理解代理是如何做出决策的,这引出了规划器的概念。 第二部分详细描述了规划循环(planning loop)和规划器的工作机制。规划循环分解了代理的行为,明确决策边界,并在每个步骤前检查策略。接着,文章引入了变量传递规划器(variable-passing planner),强调内存的重要性以及变量如何帮助控制数据流和安全策略。 然后,文章讨论了信息流标签(information-flow labels),特别是保密性和完整性的标签系统。通过这些标签,可以跟踪数据流动,并确保安全策略得到执行。最后,作者展示了如何将这些标签整合到规划循环中,实现数据追踪和安全控制。 总结起来,这篇文章探讨了如何通过分解代理的行为、使用变量传递和信息流标签来增强AI代理的安全性。它为后续部分的理论与实践结合奠定了基础。 现在,我需要将这些内容浓缩到100个字以内。重点在于说明文章讨论了AI代理的安全性问题,并通过分解行为、使用变量和标签来实现信息流控制。 </think> 文章探讨了如何通过分解AI代理的行为、使用变量传递和信息流标签来增强其安全性。它详细介绍了规划循环、变量传递规划器以及保密性和完整性标签系统,并展示了如何将这些机制整合到规划逻辑中以实现数据追踪和安全控制。 2025-12-21 08:11:48 Author: infosecwriteups.com(查看原文) 阅读量:7 收藏

Ofir Yakovian

Inside the Planner: How Decisions, Memory, and Labels Can Shape Agent Behavior

Press enter or click to view image in full size

This article is part of a three-part series that explains and contextualizes the Microsoft Research paper: Securing AI Agents with Information-Flow Control (written by Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin).

My goal is to translate their theoretical model and guarantees into something security engineers, architects, and researchers can use, without sacrificing rigor.

1. From Agent Loops to Planners

In Part I, we asked a simple but uncomfortable question:

What happens when you give an AI agent the keys to your systems?

We saw how tool-calling agents can be hijacked by prompt injection and abused to leak data or perform unintended actions. We argued that Information-Flow Control (IFC) is a promising way to make such leaks impossible by design.

But there’s a missing piece. Before we can control agents, we need to understand how they actually decide what to do. Where does the decision (e.g., “send this email”, “query this API”, or “write to this datastore”) truly come from?

That decision lives in the planner. This part is about how a planner loops, how it remembers, how it carries labels, and how we can instrument it to enforce security.

2. The Planning Loop

Recall the high-level agent loop (from Part I, Section 2). That loop is intentionally generic, but it’s also too monolithic. For security, we need a clear control surface. That is, a decomposition where we can explicitly point to each decision boundary and say:

“At this exact point, before any tool runs, check the policy.”

The paper achieves this by decomposing the agent into:

  • A planning loop: the fixed scaffolding that drives interaction, and
  • A planner: the strategy that decides what action to take next.

Think of the planning loop as the kernel scheduler, and the planner as the process that makes system calls.

2.1. Action Spaces

The planning loop mediates all interactions with the model, tools, and the user. It is parameterized by a state-passing planner function P.

At each iteration, P consumes the latest message in the conversation and returns one of three actions:

Press enter or click to view image in full size

As such, every loop iteration is represented as one of three action types:

Press enter or click to view image in full size

Planning Loop — Action Types

That’s the entire API between “how the agent thinks” and “how the environment executes”.

2.2. Basic Planner Algorithm

The planner is the actual gatekeeper between model reasoning and real-world side effects. The planning loop works as follows:

Press enter or click to view image in full size

Planning Loop — Algorithm

This mechanism is delicate, and the reader should pause here and pay close attention to the following observations:

Press enter or click to view image in full size

Planning Loop — Key Observations

You can already see where IFC will hook in:

before executing MakeCall, ask “is this call safe under our policy?”, and, before returning Finish, ask “are we allowed to reveal this information?”

3. The Variable-Passing Planner

A planner that only looks at raw messages is too weak for real-world tasks.

Agents need memory: “What did that API return?”, “Which file did I just read?”, “ What is the ID of the ticket I created?”. Without the ability to store and reuse this information, the planner cannot assemble non-trivial workflows.

3.1. Adding Planner Memory

The paper introduces a more powerful planner that keeps an internal memory μ. You can think of μ as a map μ: variable_name → value. When a tool returns a result, the planner:

Press enter or click to view image in full size

Variable-Passing Planner — Memory Use

This is the variable-passing planner.

3.2. Why Variables Matter

Variables are not just for convenience. They are the foundation for IFC. They give us three key capabilities:

1. Composability

A later tool call can say “use x and y from earlier steps” without the model hallucinating those values.

2. Control over what the LLM sees

The planner can decide whether to expose the raw value or keep it as a variable. This is an IFC hook: we may want an LLM that reasons about the existence of a variable without seeing the secret inside it.

3. Clear boundaries for labeling

Get Ofir Yakovian’s stories in your inbox

Join Medium for free to get updates from this writer.

Each variable can carry an IFC label indicating its origin and the level of trustworthiness.

3.3. Variable-Passing Flow

Conceptually, the variable-passing planner behaves like this:

Variable-Passing Planner Flow

You can think of the variable-passing planner flow as building a small, typed environment for the agent’s current “plan.”

4. Adding Information-Flow Labels

Up to now, the planner has only been about “control flow”. To reason about security, we need to track what data flows through that control flow.

We assign labels from a set L to all pieces of data in the system. We require that labels L form a lattice with a partial order and join operation ⊔, used to compute the least upper bound of two labels.

Two dimensions are particularly important for us: confidentiality (who is allowed to read data) and integrity (who is allowed to modify data).

4.1. Confidentiality Lattice

The canonical confidentiality lattice consists of two elements:

Confidentiality Lattice

where L denotes public (low-confidentiality) data and H denotes secret (high-confidentiality) data. In this lattice:

Press enter or click to view image in full size

For example, if data x is readable by users {A, B, C} and data y is readable by users {B, C, D}, then any data derived from both (e.g., their concatenation xy) is labeled with {A, B, C} ⊔ {B, C, D} = {B, C}.

This formalizes a key confidentiality principle: derived data must not be visible to anyone who was not authorized to see all of its inputs.

4.2. Integrity Lattice

Integrity is modeled dually to confidentiality. The canonical integrity lattice is:

Integrity Lattice

where T denotes trusted (high-integrity) data and U denotes untrusted (low-integrity) data. In this lattice:

Press enter or click to view image in full size

For example, if data x may be written by users {A, B, C}, and data y by users {B, C, D}, then any data derived from both (e.g., their concatenation xy) must assume influence from {A, B, C} ⊔ {B, C, D} = {A, B, C, D}.

This reflects the integrity threat model: if an untrusted actor may have influenced any input, the result must be treated as potentially influenced by all of them.

4.3. Product Lattice: Putting Them Together

The system uses the product of confidentiality and integrity lattices:

The (Integrity x Confidentiality) Lattice

You can picture this as a diamond:

Press enter or click to view image in full size

Arrows go from bottom to top following “can flow to” rules. For example, trusted public (T, L) can safely flow anywhere, and untrusted secret (U, H) is the most restrictive.

This product lattice is the space in which IFC policies are defined.

5. Propagating Labels Through the Planner

Now we combine the planning machinery with labels. This is where IFC becomes operational.

The idea is to run the planning loop with taint tracking:

Press enter or click to view image in full size

Propagating Labels — Taint Tracking

5.1. Labeling Variables and Actions

Formally, there’s a function τ that assigns labels to variables.

You can think of τ as a map τ: variable → L. Tool results are stored in variables x, and each variable has a label τ(x) summarizing the labels of the tool’s arguments, and the labels of any datastore locations it reads. Actions also carry labels. In particular, a tool f has a static label (e.g., trusted/untrusted), and each argument to the tool call has a dynamic label.

The tool result and any datastore variables W(f) it writes to are assigned a label that soundly over-approximates the labels of the action and all datastore variables R(f) it may read from.

5.2. Planning Loop with Taint Tracking

The taint-tracking planning loop extends the previous algorithm with:

  1. A policy function that decides if an action is allowed given labels.
  2. A label computation step that computes the label of the tool result and updates variable labels accordingly.

Press enter or click to view image in full size

Taint Tracking — Algorithm

JoinLabels computes a single, conservative label for the result of a tool call. It joins (takes the least upper bound of) three sources of influence: the label of the tool itself, the labels of all arguments passed to the tool, and the labels of all datastore variables the tool may read. The resulting label soundly over-approximates everything that could have affected the tool’s output, ensuring that no dependency (whether from inputs, state, or the tool’s own trust assumptions) is lost.

UpdateLabels propagates that result label back into the datastore after the tool executes. For every variable the tool may write, the label map is updated so those variables now carry the result’s label, reflecting that their contents are influenced by the same sources as the tool output. This step preserves label monotonicity across the agent’s execution and prevents later planner decisions from treating derived state as more trusted or less sensitive than it truly is.

In this part, we zoomed in on the planner, the component where control decisions are made, and showed how taint tracking can be embedded directly into planning logic.

In Part III, we take the final step. We move from mechanisms to guarantees. We will show how these labeled planners give rise to concrete security properties: what is prevented, what is allowed, and why. This is where theory meets practice.

Follow to get notified when Part III drops.


文章来源: https://infosecwriteups.com/securing-ai-agents-with-information-flow-control-part-ii-d857e8937253?source=rss----7b722bfd1b8d---4
如有侵权请联系:admin#unsafe.sh