Unlike conventional IT systems—with bounded entry points, predictable patch cycles, and known vulnerabilities—large language models (LLMs) and next-generation AI agents create an attack surface so broad, dynamic, and interconnected that comprehensively mapping or policing it becomes nearly impossible. Every new integration, plugin, RAG pipeline, or deployment scenario multiplies exposure:
Recent studies reveal that over 80% of production models tested in 2025 still succumb to at least one form of adversarial exploitation.
Anthropic recently discovered a cyber espionage campaign run primarily by an AI system—a watershed moment for cybersecurity. Their investigation revealed that Chinese state-aligned actor GTG-1002 leveraged Anthropic’s Claude Code platform to coordinate large-scale intrusions targeting technology, finance, chemical manufacturing, and government sectors worldwide. The AI autonomously orchestrated between 80% and 90% of the operational lifecycle—covering reconnaissance, exploit code generation, credential harvesting, lateral movement, and data exfiltration—with humans intervening only for key decision points.
Attackers decomposed tasks and distributed them across thousands of instructions fed into multiple Claude instances, masquerading as legitimate security tests and circumventing guardrails. The campaign’s velocity and scale dwarfed what human operators could manage, representing a fundamental leap for automated adversarial capability. Anthropic detected the operation by correlating anomalous session patterns and observing operational persistence achievable only through AI-driven task decomposition at superhuman speeds.
Though AI-generated attacks sometimes faltered—hallucinating data, forging credentials, or overstating findings—the impact proved significant enough to trigger immediate global warnings and precipitate major investments in new safeguards. Anthropic concluded that this development brings advanced offensive tradecraft within reach of far less sophisticated actors, marking a turning point in the balance between AI’s promise and peril.
Distinguishing between “offensive AI” and familiar paradigms like red teaming is critical. Traditional red teams simulate attacker tactics to test defenses, typically relying on human creativity, gradual exploration, and hands-on exploitation—phishing, network pivoting, physical intrusion, and manual social engineering.
AI-based offensive operations exploit vulnerabilities across entire ecosystems instantly with the goal of exfiltrating critical intelligence and causing damage to the target. Offensive AI iterates adversarial attacks and novel exploits on a scale human red teams cannot attain. Defenses that work well against traditional techniques often fail outright under continuous, machine-driven attack cycles.
Pattern Labs—now rebranded as Irregular—has become the face of the burgeoning AI offensive testing industry. With major contracts from OpenAI, Anthropic, and Google, and over $80 million in funding, Irregular has pioneered adversarial simulation environments that subject LLMs and AI stacks to extreme operational scenarios.
Their process mimics large enterprise networks, deploying hostile agents and automated attack sequences that mirror and expand on the tactics Anthropic uncovered: probing plugin vulnerabilities, exploiting cross-system trust, and seeking to escalate privileges through novel LLM and agent behaviors. Irregular’s platform feeds these findings into model hardening cycles, catching vulnerabilities conventional red teams would miss, often weeks or months before public deployment.
XX (Twenty) has assumed a parallel, but often more secretive, role—thanks largely to hundreds of millions in Pentagon contracts designed to accelerate national security adoption of “frontier AI.” Twenty says it is “fundamentally reshaping how the U.S. and its allies engage in cyber conflict.”
These contracts leverage XX’s ability to unleash “synthetic adversaries” capable of chaining digital, physical, and social exploits within simulated military and government infrastructures at unprecedented scale. The neural-network-driven agents probe for weaknesses in supply chain links, software-defined radio networks, satellite command, and battlefield communications—evaluating both technical and operational resilience faster than human adversaries ever could.
While little is known about Twenty’s products or methodology, given its hiring plans and its focus on simultaneous attacks on hundreds of targets, Twenty appears to be building the next level of cyberwarfare automation, going far beyond lab simulation or red-teaming of the US military’s IT environments.
Offensive AI operations face several acute constraints:
AI offensive software now serves as both a catalytic threat and a catalyst for innovation. Anthropic’s revelation of an autonomous LLM-driven espionage campaign underscores the new reality: adversarial AI operates at machine speed and complexity, impervious to slow, human-driven security cycles. Offensive operations from Irregular, XX, and military actors demonstrate what attackers and defenders can achieve.
To adapt, organizations must make their own security as dynamic, adaptive, and scalable as the adversaries they face. Only through relentless, AI-augmented defense, rigorous adversarial simulation, and global coordination can enterprises hope to stay secure amid the infinite and ever-evolving attack surface of LLM and agentic AI.
Recent Articles By Author