In our ever-evolving digital landscape, protecting vast amounts of data from looming cyber threats is a round-the-clock job. As more companies navigate the realm of big data, they face the extraordinary challenge of achieving a state of continuous data security. Enter the sentinel of our times — anomaly detection. This cutting-edge technology constantly monitors and vigilantly surveys your data, prepared to alert at the slightest indication of a security threat. In this article, we explore how anomaly detection can aid companies in bolstering their security, transforming their data repositories into unassailable strongholds.
Anomaly detection is a technique used to identify unusual patterns or behaviors that do not conform to expected behavior, often referred to as outliers. These anomalies could be indicative of critical incidents, such as fraud, defects, errors, or significant changes in patterns — all of which could have a serious impact if not detected. Anomaly detection is also sometimes referred to as suspicious activity detection or abnormal behavior detection. This powerful tool is used in a variety of fields, including fraud detection, intrusion detection for cybersecurity, fault detection in safety-critical systems, health monitoring, and detecting ecological disturbances. For instance, in financial sectors, anomaly detection can help identify suspicious transactions that might signify money laundering or fraud. Similarly, in the healthcare sector, it can flag unusual patterns in patient data that might indicate errors or fraud in billing. In the context of cybersecurity, anomaly detection can be used to alert a company if their data has been accessed in an unusual manner or location. No matter your sector, using anomaly detection is crucial to safeguard your infrastructure.
From a machine learning perspective, anomaly detection involves training an algorithm to learn what normal data looks like, and then using that algorithm to identify anomalies or outliers — data points that are significantly different from the norm. This process can be approached from several perspectives, including supervised learning (with labeled normal and anomalous examples), unsupervised learning (with unlabeled data), and semi-supervised learning (with labeled normal examples and unlabeled anomalous examples).
Supervised learning is a type of machine learning where the model is provided with labeled input data and the desired output results. The model is trained to learn the relationship between the inputs and outputs and is then able to predict the output for new, unseen data. A great example of this is when a child learns the names of animals. Parents would show a picture of an animal to the child and say, “this is a dog”. After several repetitions of this process with different animals, the child starts to recognize and correctly identify animals on their own. Similarly, after sufficient training an anomaly detection model will be able to accurately predict if an event is unusual or if it is normal.
Unsupervised learning is another type of machine learning where the model is provided with only input data, and no explicit output labels are given. The model is expected to discover the hidden structures within the data on its own. An example of this in a child’s learning process is when a child plays with building blocks. The child is not told what to create with the blocks, but through experimenting and playing, they learn how to balance blocks, what combinations look good, etc., thus discovering patterns and structures themselves. Unsupervised learning uses techniques such as clustering, density estimation, or autoencoders to identify instances that are unusual or anomalous.
Semi-supervised learning falls between supervised and unsupervised learning. In this approach, the model is trained on a combination of labeled and unlabeled data. In semi-supervised anomaly detection, the model is trained on a dataset where only normal instances are labeled, and it learns to identify anomalous instances as those that significantly deviate from the normal ones. This method is useful when acquiring a fully labeled dataset is expensive or impractical. Back to our child’s learning context, think of semi-supervised learning as learning to ride a bike. Initially, parents might guide the child, showing them how to pedal and balance (supervised learning). But eventually, the child is left to practice on their own, figuring out how to balance, speed up and slow down, with occasional guidance or intervention from parents when necessary (unsupervised learning). In this scenario, the child is learning through a combination of supervised and unsupervised methods, which is akin to semi-supervised learning.
Most companies do not have the resources to create an anomaly detection model in-house and instead buy a tool or hire another company to monitor their data. One of the easier ways to incorporate anomaly detection into your systems is to find a company that can both store data and perform this detection.
Although cybersecurity audits may not explicitly mention anomaly detection, many audits involve evaluating systems for their ability to identify and respond to unusual activities that could indicate a security threat. The ISO 27000 series requires organizations to establish, implement, and maintain an information security risk management process, which would typically include some form of anomaly detection. In the NIST CSF publication, anomaly detection is included under the Detect function (DE.AE) which recommends that, “anomalous activity is detected in a timely manner and the potential impact of events is understood.” One of the key principles of the SOC 2 audit is unauthorized access detection and prevention. Anomaly detection can be a part of this principle. The PCI DSS requires organizations to monitor and test networks regularly, which often involves using anomaly detection systems to identify potential intrusions or suspicious activities. While anomaly detection is not directly mentioned, the HIPAA requires covered entities to implement technical policies and procedures for electronic information systems that allow access only to those persons or software programs that have been granted access rights. By incorporating anomaly detection mechanisms, companies can achieve regulatory compliance, while mitigating risks and improving their overall security posture.
SAP’s primary tool for anomaly detection is SAP Data Custodian through its Transparency and Control Service. SAP Data Custodian allows you to detect anomalies from various cloud resources, both SAP resources and through supported third parties. SAP Data Custodian uses the machine learning methods described above to develop a baseline understanding of users’ behavior based on observed patterns. An anomaly, such as data being accessed in a country that is suspicious or unusual, is flagged and brought to the attention of the responsible parties through the SAP Data Custodian’s alerting system. This ability to have a deeper understanding of where and how data was accessed can be used to create new policies and tasks, further enabling a company to protect its data from unauthorized access. Ultimately by i with SAP’s Data Custodian, companies are able to seamlessly and proactively address potential threats, maintain compliance and achieve a competitive edge.
Learn more about anomaly detection at SAP: