The agentic AI future is upon us, and it poses age-old tradeoffs between security and productivity with higher stakes than ever.
In early 2026, the open-source Clawdbot agent gained massive traction for its agentic power to act independently on the user’s device while running locally for privacy. The thirst for such a powerful autonomous assistant was clear, gaining over 85,000 GitHub stars in a single week. But many researchers, including our own, noted security gaps like exposed gateways, plaintext credential storage, excessive permissions and more.
The risk and productivity of AI agents lie within their privilege—the access granted to them to act on our behalf. It’s almost certain that future intrusions will target AI systems.
We predict these attacks will fall into two pathways: targeting the open-source AI ecosystem and targeting an organization’s internal AI agents. Methodologies for securing these resources are nascent and emerging practically in real time, but in this blog, we’ll share what we know so far.
Open source AI systems are new and fast-evolving. By that virtue, they contain more risk. There are no standardized signing or integrity checks for models, and high trust in popular repositories means that these attacks spread widely, rapidly and before threats are detected.
Yet open source is inevitable for implementing AI. The open source AI ecosystem forms the backbone of the world’s current AI infrastructure. Every major LLM deployment, from Grok to ChatGPT, runs on an open source foundation while proprietary layers handle business-specific execution.
While AI agents hold the potential to act as force multipliers within the business, they hold the same potential for threat actors. A single corrupted model, connector or dependency in the AI supply chain can be used across many teams and workflows, pushing hostile behavior everywhere at once.
In a model file attack, attackers upload malicious AI model files to trusted open source repositories. These files look legitimate, sometimes with official branding, but contain hidden executable code. When a developer loads the model, the malicious payload is executed automatically. Common model file attacks can steal AWS credentials from metadata services, download remote access trojans and exfiltrate data to attacker servers. After that, the model usually functions normally, so users don’t notice the breach.
In rug pull attacks, an attacker manipulates the Model Context Protocol (MCP) server that an AI agent connects to in order to perform malicious actions. MCP servers add tools for AI agents and give them capabilities. Many of the most useful MCP servers are simply open source code projects maintained by untrusted third parties. If the repository is compromised, an attacker can modify the MCP server to perform malicious actions after an LLM is integrated with it–for example, copying data and sending it to an outside source. End users who simply keep their tools up to date are at risk of rug pull attacks without being aware.
The alternative is to use remote MCP servers whose code is maintained by trusted organizations. Many popular platforms, such as GitHub, maintain their own remote MCP servers. These servers can be connected to and are generally trusted to the extent that an organization trusts the MCP provider. This does not prevent agents from performing malicious actions with the tools they are given via the remote MCP server; it simply reduces the risk of an MCP rug pull attack.
If an AI agent is like a supercharged employee, a compromised AI agent is like a supercharged insider threat. Delegating authority to agents gives them access and privileges that would normally require human action. They can send fraudulent messages, alter approvals and permissions, exfiltrate data, approve incorrect financial actions and more.
Because agents are trusted internally, suspicious behavior is likely to go unnoticed until something breaks.
For predictive models used for business intelligence, manipulation will influence business decisions in ways that may go unnoticed until financial or regulatory harm surfaces. Language model exploitation will likely see tactics around data extraction. A compromised agent will enable multi-step fraud and data harvesting with the speed of an automated system acting as an internal user.
Malicious usage of agents may not be the largest threat surface, however. Due to their nondeterministic behavior, it will not be uncommon for trusted users to unintentionally perform harmful actions via an organization’s agents.
The immense efficiency gains promised by AI agents will raise the risk tolerance of the average enterprise. Organizations face a major question: What are the minimum degree of controls that can be placed on agents without seriously undermining their return on investment?
Keep it simple. Identify the simplest security policies possible, implement them and revisit those policies every eight weeks. That’s how fast AI is evolving.
Strictly enforce agent access controls. The more power and permissions an agent has, the more strict organizations must enforce access controls. Agents with read-only access to resources present a significantly lower threat surface than agents with write permissions. Even if an agent is compromised or manipulated, the boundaries set by the hard-coded permissions will drastically limit the blast radius.
Treat agents as potentially rogue employees or contractors. Our research, and the experience of others, has found that AI agents occasionally perform harmful actions simply due to their nondeterministic architecture. Apply architectural limits and ensure every AI agent action goes through checkpoints you can monitor, log and disable if necessary.
Centralized org-specific agents accessible via an API or URL are continuing to provide time savings, but local and customizable agents such as Claude Cowork and OpenClaw are likely to be the significant drivers of productivity in the near future.
These trends, along with the rapid pace of development, point to the growing importance of the AI supply chain. Models and agents rely on layers of external code, datasets, connectors and APIs. A single compromised link can push hostile behavior into multiple systems at once. As integration accelerates, securing AI will become a core part of modern resilience and will demand the same level of governance and validation applied to any other critical system.
At Unit 42, our elite threat researchers and responders live on the bleeding edge of AI. We’ll help you empower safe AI use and development across your organization. We can assist to:
To read more about the evolving AI threat landscape, check out the full 2026 Unit 42 Global Incident Response Report, and learn more about how Unit 42 can help you turn risk into resilience.