Press enter or click to view image in full size
LLM agents merge low-trust data ingestion, probabilistic planning, and high-impact tool execution into a single runtime path. That collapses traditional control boundaries: untrusted content can influence planning, planning can invoke privileged actions, and side effects can occur before deterministic policy checks are applied. For red teams and AppSec, the attack surface is no longer only APIs and code; it is the instruction supply chain across prompts, retrieval, memory, and tools.[1] This broadly aligns with OWASP LLM risks and MITRE ATLAS adversary behaviors.[2] [3]
Methodology: this guide is derived from public security research, offensive testing literature, framework taxonomies, and reproducible PoC patterns. Claims are labeled using three evidence tiers: Confirmed public incident, Confirmed public research/PoC, and Plausible, not publicly confirmed. Where broad production-scale empirical confirmation is lacking, that gap is explicitly stated rather than inferred.[1] [2] [3]