LLM10: Unbounded Consumption

The OWASP Top 10 for LLMs was released this year to help security teams understand and mitigate the rising risks to LLMs. In previous blogs, we’ve explored risks 1-9, and today we’ll finally be deep diving LLM10: Unbounded Consumption.

Unbounded Consumption occurs when LLMs allow users to conduct excessive prompt submissions, or submission of overly complex, large or verbose prompts, leading to resource depletion, potential Denial of Service (DoS) attacks, and more.

An inference is the process that an AI model uses to generate an output based on its training. When a user feeds an LLM a prompt, the LLM generates inferences in response. Follow-up questions trigger more inferences, because each additional interaction builds upon all the inferences, and potentially also previously submitted prompts, required for the previous interactions.

Rate limiting controls the amount of requests an LLM can receive. When an LLM does not have the adequate rate limiting, it can effectively become overwhelmed with inferences and either begin to malfunction, or reach a cap on utilization and stop responding. A part of the LLM application could become unavailable.

In AI security, we often refer to the “CIA,” which stands for Confidentiality, Integrity and Availability. Unbounded Consumption can cause an LLM to fail at the “Availability” part of this equation, which in turn can affect the LLM’s Confidentiality and Integrity.

Another way in which Unbounded Consumption can negatively impact an LLM is through Denial of Wallet (DOW). Effectively, attackers will hit the LLM with request upon request, which can run up the bill if rate limiting is not in place. Eventually, these attacks can cause the LLM to reject requests due to the high volume of abnormal activity, which will stop it from working entirely.

Mitigation Methods

Some ways to reduce the risk of Unbounded Consumption include:

Input Validation– ensure that inputs do not exceed reasonable size limits
Rate Limiting– apply user quotas and limits to restrict requests per user
Limit Exposure of Logits and Logprobs– obfuscate the exposure of API responses, provide only necessary information to users
Resource Allocation Management- monitor resource utilization to prevent any single user from exceeding a reasonable limit
Timeouts and Throttling– set time limits and throttle processing for resource intense operations to prevent prolonged resource consumption
Sandbox Techniques- restrict the LLMs access to network resources to limit what information it can expose
Monitoring and Logging– get alerts and continually monitor usage for unusual patterns

Unbounded Consumption poses a critical risk to LLMs as it can cause DoS or DoW, however, with proper security measures and training, teams can minimize the risk of Unbounded Consumption in their AI applications. For more information on the rest of the OWASP Top 10 for LLMs, head over to the LLM series on our blog page. And for general information on how to take charge of your own AI security posture, schedule a demo today!

*** This is a Security Bloggers Network syndicated blog from FireTail - AI and API Security Blog authored by FireTail - AI and API Security Blog. Read the original post at: https://www.firetail.ai/blog/llm10-unbounded-consumption