Checkmarx today published a technique it has uncovered that poisons artificial intelligence (AI) agents models in a way that convinces them to tell end users that certain activities and behaviors are safe when in fact they are high risk.
Darren Meyer, security research advocate at Checkmarx Zero, a research arm of the company, said this “lies-in-the-loop” (LITL) method takes advantage of the human-in-the-loop (HITL) safeguards that require approval whenever there is some type of sensitive action that needs to be executed.
For example, Checkmarx Zero researchers were able to prompt the Claude Code created by Anthropic to run an arbitrary command by communicating a plan for safe and reasonable behavior that could be shared should cybercriminals gain access to a developer’s credentials via a phishing attack. While it’s possible a developer might notice a discrepancy, every developer that researchers tested this prompt injection method with did not discover what amounts to a remote code execution attack (RCE), said Meyer.
Similar attacks could also be applied to any AI agent that relies on HITL interactions to ensure safety and security, he added. The issue is that AI agents and models are naive to the point of being gullible, said Meyer. As such, they can be easily convinced that an unsafe action is the exact opposite, he added.
In the case of Anthropic, the company’s official position is that HITL protection is that responsibility for preventing a LITL attack resides with the end user. However, it’s a little too easy to obfuscate malicious actions by surfacing a massive amount of code in a way that makes it difficult for developers to notice there is a line of code that enables remote execution, noted Meyer.
Add in the fact that many application developers inherently trust code that is being generated using examples of code stored in a Git repository and the chances such an attack will be discovered by a developer are relatively low, he added. In fact, a repository such as GitHub is already teeming with malicious actors waiting for an opportunity such as this to inject code into a software supply chain, said Meyer.
It’s not clear to what degree cybercriminals are using LITL techniques to comprise AI tools but now that researchers have shared this methodology cybersecurity teams will need to advise end users to more carefully review the output being provided by these tools. It’s relatively trivial for credentials to be stolen, the opportunity that AI tools afford cybercriminals to engage in all kinds of malicious activity is significant.
Ultimately, using these tools is yet another service that falls under the heading of the shared responsibility model. The issue, as always, is that while cybersecurity professionals understand the implications of any shared responsibility model most end users still do not. As a result, rather than if it’s now more a question of when and how often there will be cybersecurity incidents involving AI tools.
Recent Articles By Author