Model Stealing or Extraction Attacks tries to replicate the functionality of a proprietary model without direct access to its parameters or architecture. The attacker systematically queries the model with inputs and records the outputs. Using this input-output pair data, they train a new model (the "stolen" model) that approximates the behavior of the original model. This process can be refined by selecting inputs that are likely to provide the most information about the model's behavior.
Example: Consider a cloud-based image recognition service that charges per query. An attacker could use a diverse set of images to query the service, collecting the labels and confidence scores provided by the model. Then, the attacker trains a new model on this dataset. The result is a cloned model that mimics the original model's functionality, allowing the attacker to bypass the need to pay for the cloud service.
The scenario depicted demonstrates a "Model Stealing or Extraction Attack," where an attacker replicates the functionality of a proprietary AI model without accessing its architecture or training data. Such attacks are significant in cases where the model is a source of competitive edge or revenue, like in cloud-based image recognition services. The attack process is as follows:
An attacker selects a cloud-based image recognition service, which bills per query. This service uses an AI model to analyze images and provide labels and confidence scores.
The attacker gathers a varied image set, covering a range of subjects, scenes, and objects. This variety is vital to match the original model's scope.
The attacker submits these images to the cloud service, recording the AI model's labels and confidence scores for each image, thus creating a new dataset pairing each image with its results.
The attacker then trains a new AI model using this dataset, aiming to imitate the original model by predicting labels and confidence scores. This involves adjusting the cloned model to align its predictions with the original model's outputs.
Once trained, the attacker uses this model independently for image recognition, avoiding the need to pay for the cloud service. The cloned model could even become a rival service.