Protect AI today launched a Guardian gateway that enables organizations to enforce security policies to prevent malicious code from executing within an artificial intelligence (AI) model.
Guardian is based on ModelScan, an open source tool from Protect AI that scans machine-learning models to determine if they contain unsafe code. Guardian extends that capability to a gateway organizations can use to thwart, for example, a model serialization attack. This occurs when code is added to the contents of a model during serialization, also known as saving. Once added to a model, this malicious code can be executed to steal data and credentials or poison the data used to train the AI model.
Cybersecurity is one of the more challenging issues when it comes to AI—and it’s an issue that isn’t getting the attention it deserves. Most data scientists don’t have a lot of cybersecurity training, so the machine learning operations (MLOps) processes relied on to construct an AI model often lack any type of scan for vulnerabilities.
Protect AI’s co-founder and CEO Ian Swanson said even when a scan is run, there’s a tendency to rely on tools made available via an AI model repository such as Hugging Face. The fact that these tools are made available is a good first step, but not all scanning tools are created equal, noted Swanson. For example, Protect AI today released a report showing that its ModelScan tool was used to evaluate more than 400,000 model artifacts hosted on Hugging Face.
Protect AI reported it was able to find 3,354 models that use functions executing arbitrary code on model load or inference. A full 1,347 of those models were not marked unsafe by the Hugging Face scanning tool. That’s an issue because it means malicious actors can potentially upload compromised models that inject code when a model thought to be secure is loaded or executed, said Swanson.
In fact, a recent report published by Protect AI identified remote code execution (RCE) vulnerabilities in an open source MLflow life cycle management tool that can be used to compromise AI models.
In addition, many teams are downloading models from public repositories for use in production environments. In fact, many of these models were built using code that has known vulnerabilities, noted Swanson.
If a vulnerability is discovered later, however, the model, unlike other types of software artifacts, can’t be patched to remediate the issue. The entire model will need to be retrained, so in effect, many AI models are insecure by design, said Swanson. The cost of retraining an AI model is high, so any time there is a cybersecurity event, it’s likely to be much more costly than remediating any other type of software artifact.
The degree to which AI models are being compromised is not well known, but it’s clear most of them are less secure than many organizations realize. It’s only a matter of time before organizations that embrace AI will eventually run into these cybersecurity issues. The only thing that remains to be seen is how quickly those concerns will be acted upon before there is an inevitable breach.
Recent Articles By Author