Cloudflare Unveils a Firewall Designed to Keep LLMs Safe

Cloudflare Unveils a Firewall Designed to Keep LLMs Safe
2024-3-6 00:59:48 Author: securityboulevard.com(查看原文) 阅读量:10 收藏

Cloudflare wants to help organizations wall off their large-language models (LLMs) from cyberthreats and give enterprises an AI framework to ward off risks, many of which are themselves based on the emerging technology.

The cloud connectivity and cybersecurity company this week introduced the Firewall for AI, another layer of protection for LLMs that are foundational to such generative AI tools like OpenAI’s ChatGPT and Google’s Gemini. It also unveiled its Defensive AI capabilities that enterprises can use to protect their APIs and email against AI-based attacks by detecting malicious code and anomalous network traffic and enforcing strict verification policies using a zero-trust approach.

Such proactive measures are needed at a time when the innovation and adoption of generative and predictive AI is ramping quickly and threat groups are using the emerging tools to launch more sophisticated attacks.

Organizations are increasing concerned about security for their LLMs, according to Daniele Molteni, group product manager at Cloudflare.

“Using LLMs as part of Internet-connected applications introduces new vulnerabilities that can be exploited by bad actors,” Molteni wrote in a blog post. “Some of the vulnerabilities affecting traditional web and API applications apply to the LLM world as well, including injections or data exfiltration. However, there is a new set of threats that are now relevant because of the way LLMs work.”

He pointed to a recent example of how bad actors are exploiting a vulnerability that is allowing them to compromise AI models submitted to Hugging Face.

Building on Traditional WAF

Firewall for AI is essentially a bulked up web application firewall that features traditional WAF tools like rate limiting and sensitive data detection but also includes another layer that is still being built, he wrote.

“This new validation analyzes the prompt submitted by the end user to identify attempts to exploit the model to extract data and other abuse attempts,” Molteni wrote. “Leveraging the size of Cloudflare network, Firewall for AI runs as close to the user as possible, allowing us to identify attacks early and protect both end user and models from abuses and attacks.”

There are two key differences between LLMs and typical web apps. One is traditional apps are “deterministic” and include a set of operations that can be secured by controlling the operations accepted by various endpoints.

LLMs are non-deterministic, with interactions based on natural language, making it more difficult to identify requests than simply matching attack signatures. In addition, unless a response is cached, LLMs will provide a different response every time even if the prompt itself is used again.

In addition, in traditional applications, the code is separated from the database and users can only interact with the underlying data. With LLMs, the training data becomes part of the model via the training process, so it’s difficult to control how the data is shared based on a user prompt.

“Some architectural solutions are being explored, such as separating LLMs into different levels and segregating data,” Molteni wrote, adding that “no silver bullet has yet been found.”

Searching for Potential Threats

Firewall for AI will be used in the same way as a traditional WAF, scanning every API request with an LLM prompt for patterns and signatures that could indicate an attack and automatically blocking potential dangers. It will be deployed in front of models on Cloudflare’s Workers AI platform – which enterprise use to leverage Cloudflare’s network to create LLMs at the edge – or models hosted on third-party infrastructure.

Some features, like sensitive data detection, prompt validation, and response validation are still being developed. This will help push back against volumetric attacks like distributed denial-of-service (DDoS) as well as the disclosure of sensitive data and users sending personally identifiable information (PII) to external LLM providers like OpenAI or Anthropic.

Among the steps Cloudflare will take is using a Sensitive Data Detection (SDD) WAF to track how PII is being used. Right now, SDD is a set of managed rules that scan for financial information, such as credit card numbers, though it eventually will enable users to create custom fingerprints. To protect against PII being sent to outside LLMs, Cloudflare will have the SDD to scan a request prompt and integrate the output with the vendor’s AI Gateway to see if sensitive data is included in a request.

“We will start by using the existing SDD rules, and we plan to allow customers to write their own custom signatures,” Molteni wrote, adding that data obfuscation is another important feature. “Once available, the expanded SDD will allow customers to obfuscate certain sensitive data in a prompt before it reaches the model. SDD on the request phase is being developed.”

Firewall for AI also will protect against such threats as prompt injection vulnerabilities, with the AI firewall running various detections to identify prompt injections and similar attacks.

Going on Defense

With Defensive AI, Cloudflare is offering capabilities that can customized to fit organizations’ particular needs, with AI models being trained on user-specific traffic data.

“Fighting AI with AI is now a non-negotiable,” Cloudflare co-founder and CEO Matthew Prince said in a statement. “A personalized approach to protect data and defend against complex threats unique to an organization’s attack surface, at-speed, and scale, is paramount. By understanding ‘normal baselines’ in a customer’s environment and mitigating the threats that will move the needle towards increased resilience, Defensive AI is the crucial edge defenders need to stay ahead of today’s adversaries.”

Cloudflare will put AI Anomaly Detection into its API Gateway, which currently uses the vendor’s Sequence Analytics tool.

“To protect APIs at scale, API Anomaly Detection learns an application’s business logic by analyzing client API request sequences,” Cloudflare wrote. “It then builds a model of what a sequence of expected requests looks like for that application. The resulting traffic model is used to identify attacks that deviate from the expected client behavior.”

The tools within Defensive AI will identify unknown flaws in applications, protect against phishing attacks through features to determine who is sending messages and building models to determine risk, and collect a list of known and good email senders for each customer.

“We can then detect the spoof attempts when the email is sent by someone from an unverified domain, but the domain mentioned in the email itself is a reference/verified domain,” Cloudflare wrote.