Press enter or click to view image in full size
This article is part of a three-part series that explains and contextualizes the Microsoft Research paper: Securing AI Agents with Information-Flow Control (written by Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin).
My goal is to translate their theoretical model and guarantees into something security engineers, architects, and researchers can use, without sacrificing rigor.
In Part I, we asked a simple but uncomfortable question:
What happens when you give an AI agent the keys to your systems?
We saw how tool-calling agents can be hijacked by prompt injection and abused to leak data or perform unintended actions. We argued that Information-Flow Control (IFC) is a promising way to make such leaks impossible by design.
But there’s a missing piece. Before we can control agents, we need to understand how they actually decide what to do. Where does the decision (e.g., “send this email”, “query this API”, or “write to this datastore”) truly come from?
That decision lives in the planner. This part is about how a planner loops, how it remembers, how it carries labels, and how we can instrument it to enforce security.
Recall the high-level agent loop (from Part I, Section 2). That loop is intentionally generic, but it’s also too monolithic. For security, we need a clear control surface. That is, a decomposition where we can explicitly point to each decision boundary and say:
“At this exact point, before any tool runs, check the policy.”
The paper achieves this by decomposing the agent into:
Think of the planning loop as the kernel scheduler, and the planner as the process that makes system calls.
The planning loop mediates all interactions with the model, tools, and the user. It is parameterized by a state-passing planner function P.
At each iteration, P consumes the latest message in the conversation and returns one of three actions:
Press enter or click to view image in full size
As such, every loop iteration is represented as one of three action types:
Press enter or click to view image in full size
That’s the entire API between “how the agent thinks” and “how the environment executes”.
The planner is the actual gatekeeper between model reasoning and real-world side effects. The planning loop works as follows:
Press enter or click to view image in full size
This mechanism is delicate, and the reader should pause here and pay close attention to the following observations:
Press enter or click to view image in full size
You can already see where IFC will hook in:
before executing MakeCall, ask “is this call safe under our policy?”, and, before returning Finish, ask “are we allowed to reveal this information?”
A planner that only looks at raw messages is too weak for real-world tasks.
Agents need memory: “What did that API return?”, “Which file did I just read?”, “ What is the ID of the ticket I created?”. Without the ability to store and reuse this information, the planner cannot assemble non-trivial workflows.
The paper introduces a more powerful planner that keeps an internal memory μ. You can think of μ as a map μ: variable_name → value. When a tool returns a result, the planner:
Press enter or click to view image in full size
This is the variable-passing planner.
Variables are not just for convenience. They are the foundation for IFC. They give us three key capabilities:
1. Composability
A later tool call can say “use x and y from earlier steps” without the model hallucinating those values.
2. Control over what the LLM sees
The planner can decide whether to expose the raw value or keep it as a variable. This is an IFC hook: we may want an LLM that reasons about the existence of a variable without seeing the secret inside it.
3. Clear boundaries for labeling
Join Medium for free to get updates from this writer.
Each variable can carry an IFC label indicating its origin and the level of trustworthiness.
Conceptually, the variable-passing planner behaves like this:
You can think of the variable-passing planner flow as building a small, typed environment for the agent’s current “plan.”
Up to now, the planner has only been about “control flow”. To reason about security, we need to track what data flows through that control flow.
We assign labels from a set L to all pieces of data in the system. We require that labels L form a lattice with a partial order ⊑ and join operation ⊔, used to compute the least upper bound of two labels.
Two dimensions are particularly important for us: confidentiality (who is allowed to read data) and integrity (who is allowed to modify data).
The canonical confidentiality lattice consists of two elements:
where L denotes public (low-confidentiality) data and H denotes secret (high-confidentiality) data. In this lattice:
Press enter or click to view image in full size
For example, if data x is readable by users {A, B, C} and data y is readable by users {B, C, D}, then any data derived from both (e.g., their concatenation xy) is labeled with {A, B, C} ⊔ {B, C, D} = {B, C}.
This formalizes a key confidentiality principle: derived data must not be visible to anyone who was not authorized to see all of its inputs.
Integrity is modeled dually to confidentiality. The canonical integrity lattice is:
where T denotes trusted (high-integrity) data and U denotes untrusted (low-integrity) data. In this lattice:
Press enter or click to view image in full size
For example, if data x may be written by users {A, B, C}, and data y by users {B, C, D}, then any data derived from both (e.g., their concatenation xy) must assume influence from {A, B, C} ⊔ {B, C, D} = {A, B, C, D}.
This reflects the integrity threat model: if an untrusted actor may have influenced any input, the result must be treated as potentially influenced by all of them.
The system uses the product of confidentiality and integrity lattices:
You can picture this as a diamond:
Press enter or click to view image in full size
Arrows go from bottom to top following “can flow to” rules. For example, trusted public (T, L) can safely flow anywhere, and untrusted secret (U, H) is the most restrictive.
This product lattice is the space in which IFC policies are defined.
Now we combine the planning machinery with labels. This is where IFC becomes operational.
The idea is to run the planning loop with taint tracking:
Press enter or click to view image in full size
Formally, there’s a function τ that assigns labels to variables.
You can think of τ as a map τ: variable → L. Tool results are stored in variables x, and each variable has a label τ(x) summarizing the labels of the tool’s arguments, and the labels of any datastore locations it reads. Actions also carry labels. In particular, a tool f has a static label (e.g., trusted/untrusted), and each argument to the tool call has a dynamic label.
The tool result and any datastore variables W(f) it writes to are assigned a label that soundly over-approximates the labels of the action and all datastore variables R(f) it may read from.
The taint-tracking planning loop extends the previous algorithm with:
Press enter or click to view image in full size
JoinLabels computes a single, conservative label for the result of a tool call. It joins (takes the least upper bound of) three sources of influence: the label of the tool itself, the labels of all arguments passed to the tool, and the labels of all datastore variables the tool may read. The resulting label soundly over-approximates everything that could have affected the tool’s output, ensuring that no dependency (whether from inputs, state, or the tool’s own trust assumptions) is lost.
UpdateLabels propagates that result label back into the datastore after the tool executes. For every variable the tool may write, the label map is updated so those variables now carry the result’s label, reflecting that their contents are influenced by the same sources as the tool output. This step preserves label monotonicity across the agent’s execution and prevents later planner decisions from treating derived state as more trusted or less sensitive than it truly is.
In this part, we zoomed in on the planner, the component where control decisions are made, and showed how taint tracking can be embedded directly into planning logic.
In Part III, we take the final step. We move from mechanisms to guarantees. We will show how these labeled planners give rise to concrete security properties: what is prevented, what is allowed, and why. This is where theory meets practice.
Follow to get notified when Part III drops.