Security in the Age of Agents

So many AI agent orchestration frameworks and “operating systems” are being released right now, that claim to be built with security in mind – proudly listing their extensive security features, related to everything from their use of cryptography to sandboxes and memory-safe languages.

Then once you take a peek inside, the castle made of sand generally crumbles down fast.

Here we will take a look at one of them, a self-proclaimed Agent Operating System, named OpenFang

As is often the case, it sounds like security is a priority and that they’ve really thought about security from the ground up. It sounds quite promising, as long as you don’t bother to take a peek under the surface. 16 security systems, sandboxed execution, Merkle audit trail and taint tracking etc.

When taking a closer look, the “taint tracking” used to enforce an allowlist of commands for agents could be trivially bypassed in at least four different ways (command-splitting on |, ;, && and || but a whitelisted command followed by & cmd, `cmd`, $(cmd) and <newline>cmd works fine. Overall, they’re fighting a losing game by implementing a command line parsing based sandbox rather than using an actual sandbox (using seccomp-bpf, for instance), a container or a microVM.

As for the AES-256-GCM based auth in the OpenFang P2P protocol, well, besides being vulnerable to a replay attack, the protocol itself is completely in plaintext after the handshake!

Regarding the “WASM sandboxes”, turns out they aren’t actually used for anything, there’s not a single WASM agent in the repo, and since rather than conforming to WASI they require implementing a completely custom API, it’s unlikely that anyone would bother making a WASM based agent in the first place (not even the maintainers do, so).

And as for the “Merkle audit trail”, that audit trail is stored entirely in-memory, so simply restarting the daemon is enough to erase any traces of that trail.

Last but not least, it turns out that even their API key based authentication could be trivially bypassed, so anyone with access to the dashboard URL can get remote code execution. That’s the part I demonstrate in the video at the top, so enjoy. ;)

Note that I’m not against the idea of using AI agents. On the contrary, I think that it will become increasingly important for people to leverage the power of AI to accelerate themselves.

Being able to do so in a way that actually limits the risk and blast radius of any attack is a difficult problem though, which is why it’s one of the things I’m focusing heavily on right now…

To get notified when I start releasing some of the things I do within that space in the future, make sure to register at GRAFIT