One Explanation for DeepSeek’s Dramatic Savings: IP Theft

We may be seeing the first major AI-related IP theft with allegations that DeepSeek trained its models by stealing OpenAI’s technology. It seemed unlikely that a Chinese startup could build a breakthrough AI app at a fraction of OpenAI’s billion-dollar investment. But new claims from OpenAI and Microsoft suggest IP theft as the explanation.

While the details are still sketchy, this should raise alarms with any organization that is investing in AI systems, which includes most large enterprises in competitive markets. This should also prompt more serious considerations of new security tools to protect the expanding AI attack surface.

Microsoft reports that last fall, security researchers observed DeepSeek-linked actors exfiltrating large amounts of data through OpenAI’s API, the primary access point for developers and business customers. OpenAI accuses DeepSeek of using “distillation,” a technique where smaller models (student models) are trained by learning from larger ones (teacher models) through outputs such as predictions or probability distributions. This efficient replication method highlights vulnerabilities in APIs and signals a broader need for organizations to secure their expanding AI ecosystems.

Consumption Anomalies are Preventable

OWASP Top 10 for LLM Applications 2025 - OWASP Top 10 for LLM & Generative AI Security

OWASP and MITRE have been leading the development of security frameworks around AI. Last year OWASP released their first list of Top 10 Risks for LLM Applications. This list was updated in January 2025 with an important and timely addition: LLM10:2025 – Unbounded Consumption. If Microsoft’s evidence is correct, it seems likely that OpenAI had inadequate security controls around API consumption which could lead to model theft or replication.

OWASP has documented several techniques that can be used to exploit the Unbounded Consumption risk:

Model Extraction via API
Attackers may query the model API using carefully crafted inputs and prompt injection techniques to collect sufficient outputs to replicate a partial model or create a shadow model. This not only poses risks of intellectual property theft but also undermines the integrity of the original model.

Functional Model Replication
Using the target model to generate synthetic training data can allow attackers to fine-tune another foundational model, creating a functional equivalent. This circumvents traditional query-based extraction methods, posing significant risks to proprietary models and technologies.

How AppSOC can help

The AppSOC AI platform is designed to detect and stop emerging risks to AI systems. This includes a range of controls to maintain the security posture of MLOPs platforms, AI models, datasets, and more. These risks can range from simple misconfigurations, inadequate access controls, content anomalies, and risks of attacks like prompt injections and jailbreaking.

AppSOC has followed OWASP’s lead and maps any issues we detect to their Top 10 LLM risks. We also follow OWASP and other industry best practices to provide visibility into AI models, datasets, and notebooks, and harden MLOps environments against mistakes or attacks.

For example, assuming that Microsoft’s evidence is correct, here’s what OWASP recommends for protection against the risk of Unbounded Resources (LLM10:2025):

Comprehensive Logging, Monitoring and Anomaly Detection
Continuously monitor resource usage and implement logging to detect and respond to unusual patterns of resource consumption.

The AppSOC AI platform provides comprehensive visibility, logging, monitoring, anomaly detection, and much more to protect against known and anticipated threats to AI applications. Please see our website for more details on our protections for OWASP AI risks.

*** This is a Security Bloggers Network syndicated blog from AppSOC Security Blog authored by AppSOC Security Blog. Read the original post at: https://www.appsoc.com/blog/one-explanation-for-deepseeks-dramatic-savings-ip-theft