Fairness in Machine Learning – A New Feature in SAP HANA PAL

Fairness in Machine Learning – A New Feature in SAP HANA PAL
2023-12-8 14:44:17 Author: blogs.sap.com(查看原文) 阅读量:10 收藏

In this blog post, we are excited to introduce Fairness in Machine Learning (FAIR ML) – a new feature developed to promote fairness in machine learning within the SAP HANA Predictive Analysis Library (PAL) in 2023 QRC04.

Fairness is a critical subject within human society. The pursuit of fairness is essential in many social areas involving resource allocation, such as college admission, job candidate selection and personal credit evaluation. In such areas when human decision gets involved, it is important to ensure that the decision-makers must be endowed with a sense of fairness to avoid giving undue advantage to any particular group. Fortunately, an increasing number of decision-makers are reflecting this sentiment of fairness, signaling an encouraging trend.

As our society progresses, its complexity is also increasing. As a result, many automated decision-making systems, powered by machine learning and AI, have been developed. These systems come initially just as helpers or aiding components, yet there is a clear trend that they are began to play more significant roles. For instance, many HR implemented systems for automated filtering of job applicants in recent years. In the US, a software called PredPol (short for “Predictive Policing”) estimates potential crime rates across different regions based on historical data. A tool named COMPAS is employed by US courts to assess the probability of a defendant becoming a repeat offender.

However, one of the major concerns with these automated decision-making systems is their lack of fairness awareness. Unlike many human decision-makers, the majority of these systems are constructed without a conscious ethos of fairness or rely on limited, usually inadequate, fairness-aware mechanisms. For example, a study shows that PredPol could potentially lead to over-policing towards Black communities, while COMPAS trends to misclassify Black defendants as having a high risk of reoffending compared with other racial groups. It is noted that, as claimed by their creators, sensitive variables such as racial groups are never explicitly incorporated when building up the two aforementioned systems. This reveals that bias is not always easily removable through data preprocessing methods (like feature selection), since it may hide deeply within those seemingly uncorrelated features for sensitive variables that are inevitable model training (e.g. crime history in COMPASS).

For SAP, AI fairness is one of the three pillars of SAP Global AI ethics policy and a fundamental principle of AI ethics which aims to prevent discrimination by AI systems against specific demographic groups based on protected attributes like gender, ethnicity, or religion. Given that many of our products, including SAP S/4 HANA, utilize machine learning algorithms for predictive analysis, it is of utmost importance that our algorithms are fairness-aware when needed.

FAIR ML within the SAP HANA Predictive Analysis Library (PAL) strives to mitigate the unfairness resulting from potential biases within datasets related to features such as gender, race, age or other protected classes. FAIR ML can incorporate various machine learning models or technologies, which adds to its versatility. Currently, we support binary classification and regression tasks Hybrid Gradient Boosting(HGBT) as a sub-model.

In the following sections, let’s take a publicly accessible “Give me some credit” dataset as a case study to explore the fairness-related harms introduced in the machine learning model, and then how to mitigate such bias with our new feature of FAIR ML.

Dataset Description

“Give me some credit” dataset, comprising 150,000 rows and 12 columns contains sample data where the target variable named ‘SeriousDlqin2yrs’ indicates that a borrower experiencing financial distress in the next two years. This is a typical application of binary classification where the output could be ‘1’ (will experience financial distress) or ‘0’ (will not experience financial distress).

The remaining 11 columns consist of an ID column and various financial related data of a borrower such as age, monthly income, debt ratio, and the number of real estate loans, etc. Using this data, we can construct a predictive model to calculate an individual’s likelihood of encountering a financial crisis in the next two years.

Data Pre-processing

For the purpose of this study, data preprocessing for our example involved simplifying the ‘age’ column into two employment status groups (‘working’ and ‘retired’) by introducing a new proxy feature called ‘working_status’ based on an age threshold of 60. Ages over 60 are classified as ‘retired,’ and those under or equal to 60 are categorized as ‘working’. Following this, the ‘age’ column was removed. This case study will concentrate on the sensitive variable ‘working_status’.

Continuing further, our underlying assumption is that if a bank anticipates a financial crisis impacting the borrower within the next two years, it will opt to deny a loan to mitigate potential default. To align with this, we rename our target variable “SeriousDlqin2yrs” to “default”, facilitating a shift in our problem statement. Now, our focus is on predicting whether a borrower is more likely to default. To represent this, we still use ‘1’ and ‘0’, where ‘1’ signifies potential default and ‘0’ indicates the borrower will not default.

Additional data pre-processing steps include:

Handling missing values : The ‘MonthlyIncome’ column is filled with a mean value for missing values, and the ‘NumberOfDependents’ column is filled with a median value for missing values.
Outlier detection : Outliers detected in columns ‘NumberOfTime30-59DaysPastDueNotWorse’, ‘NumberOfTimes90DaysLate’, ‘NumberOfTime60-89DaysPastDueNotWorse’ are replaced with the median.
Dataset split : The dataset is partitioned with 80% (120,000 data points) as a training dataset for model training and the remaining 20% (30,000 data points) as a testing data for model validation.
Imbalanced dataset handling : The ratio of label 0 and label 1 is approximately 14:1, which means most data instances belong to the borrower who will not experience financial distress in the next two years. To balance the training data, a balanced dataset with a 1:1 ratio has been sampled from the original data set with regard to the label ‘SeriousDlqin2yrs’.

Training an unmitigated classifier

In this section, we will train a fairness-unaware model on the training data with a sub-model of the Hybrid Gradient Boosting(HGBT) Classifier with hyperparameter tuning and grid search.

from hana_ml.algorithms.pal.unified_classification import UnifiedClassification
from hana_ml.algorithms.pal.model_selection import GridSearchCV

# train the model with grid search
hgc = UnifiedClassification('HybridGradientBoostingTree')
gscv = GridSearchCV(estimator=hgc, 
                    param_grid={'learning_rate': [0.1, 0.4, 0.7],
                                'n_estimators': [50, 100],
                                'max_depth': [5, 10, 15],
                                'split_threshold': [0.1, 0.4, 0.7]},
                    train_control=dict(fold_num=5,
                                       resampling_method='cv',
                                       random_state=1,
                                       ref_metric=['auc']),
                    scoring='error_rate')
gscv.fit(data=df_train, 
         key= 'ID',
         label='default',
         partition_method='stratified',
         partition_random_state=1,
         stratified_column='default',
         build_report=True)

# predict
pred_result = gscv.predict(data=df_test.deselect('default'), key='ID')

The model report is shown as follow.

Model%20Report%20of%20HGBTClassifier%20model

Fig 1. Model Report of a HGBT model

Upon evaluation, the metrics in Fig. 1 such as precision, recall, F1 score for two classes, the Area Under the Curve (AUC), and accuracy appear satisfactory. We will proceed to assess the model using an imbalanced testing dataset. In this scenario, our primary sensitive variable is ‘working_status’. Our goal is to determine whether our classification model can fairly predict defaults, irrespective of the borrower’s working status – whether they are retired or currently employed.

When discussing fairness in AI systems, the first step is to understand the potential harms that the system may cause and how to identify them. In our example, if the decision to grant a loan is based on predicting default, a harmful situation occurs when the model incorrectly predicts that a borrower who will not default, will default. This can lead to the bank denying the loan based on model bias, despite the borrower being low-risk. Consequently, the model may cause harm by misallocating resources.

To assess this situation, we can observe the metric called “The False Positive Rate (FPR)”. The FPR is calculated by dividing the number of false positives (incorrectly predicted defaults) by the total number of real negative cases (borrowers who will not default). It represents the proportion of false alarms generated by the model. A high FPR indicates that the model produces many false alarms, meaning it predicts that many borrowers will default on their loans when they will actually pay on time.

The charts below present the accuracy and FPR for different employment status groups in predicting the testing data.

accuracy%20and%20FPR%20of%20a%20HGBT%20classifier

Fig 2. Accuracy and FPR of a Unmitigated HGBT Model

accuracy%20of%20FPR%20of%20two%20groups

Table 1. Accuracy and FPR of Two Groups

In the accuracy chart in Fig. 2, we can observe a higher value for the retired group (0.91) compared to the working group (0.74). However, there is a notable disparity in the False Positive Rate (FPR) between the two groups in Fig. 2. The FPR value for the retired group is approximately 7.8%, while it’s approximately 26% for the working group. Hence, we believe that the model might inaccurately predict that working individuals will default. Consequently, the system could unfairly allocate fewer loans to working individuals and over-allocate loans to retired individuals.

Unfairness Mitigation in ML Models

In this section, we employ FAIR ML to mitigate the unfairness in the HGBT classifier. The key concept is to achieve a randomized classifier with the lowest error while satisfying the desired constraints. For binary classification, various constraint options are available such as: “demographic_parity”, “equalized_odds”, “true_positive_rate_parity”, “false_positive_rate_parity”, and “error_rate_parity”. For regression tasks, an option “bounded_group_loss” is supported. The selection of constraints depends on the inherent bias of the problem.

In our case study, as FPR result in fairness-related harms. Hence, we will attempt to minimize the FPR with a constraint option “false_positive_rate_parity”.

from hana_ml.algorithms.pal.fair_ml import FairMLClassification

hgbt_fm = FairMLClassification(fair_constraint='false_positive_rate_parity')
hgbt_fm.fit(data=df_train, key='ID', label='default', 
            fair_sensitive_variable='working_status', 
            fair_positive_label ='1',
            categorical_variable=['working_status', 'default'])

pred_res_fm = hgbt_fm.predict(data=df_test.deselect('default'), key='ID')

The figures below show the comparison results of accuracy and FPR between the original HGBT model and the mitigated HGBT model for two groups.

accuracy%20of%20original%20and%20mitigated%20models

Fig 3. Accuracy of Original and Mitigated Models

FPR%20of%20original%20and%20mitigated%20models

Fig 4. FPR of Original and Mitigated Models

Accuracy%20and%20FPR%20of%20Two%20Groups%20of%20a%20Mitigated%20Model

Table 2. Accuracy and FPR of Two Groups of a Mitigated Model

As evident from the above results in Fig.3 and Fig.4, the fairness-aware model significantly lowers the FPR disparity between the two groups. There is an increase in the false positive rate for the retired group, coupled with a decrease for the working group. However, this correction introduces a trade-off that results in the accuracy of the retired group falling from 91% to 80%.

Summary

In this blog post, we first discuss the importance of fairness in AI, particularly in machine-learning decision-making systems. Even though automated systems have increased in complexity and significance, many lack robust fairness-aware mechanisms and inadvertently allow biases to persist. Then, a case study of with the “Give me some credit” dataset where unfairness associated with a sensitive attribute is detected and mitigated using Fair ML of SAP HANA PAL. The example underscores the need for fairness mitigation in machine learning models to decrease disparities, avoid harm, and ensure fairness in decision-making. Predictive models must not only be accurate and efficient, but also equitable to all demographic segments, underlining the essential role of fairness in machine learning.

Dataset Description

Data Pre-processing

Training an unmitigated classifier

Unfairness Mitigation in ML Models

Summary

Other Useful Links: