Skip to content

Latest commit

 

History

History
126 lines (89 loc) · 5.3 KB

README.md

File metadata and controls

126 lines (89 loc) · 5.3 KB

Fairness Aware Classification

This repository contains tools to address fairness issues in classification problems.

Authors: Kirill Myasoedov, Simona Nitti, Bekarys Nurtay (bekiichone), Ksenia Osipova, and Gabriel Rozzonelli.

Content

The module contains the following:

Installation

Dependencies

In order to run the provided modules, the following packages are needed:

  • numpy==1.19.5
  • pandas==1.1.5
  • scikit-learn==0.24.1

Clone this repository

git clone https://github.com/rozzong/Fairness-Aware-Classification.git

Examples

Load a toy dataset

The module datasets contains some already preprocessed popular datasets for imbalanced classification problems leading to fairness issues.

from sklearn.model_selection import train_test_split
from fairness_aware_classification.datasets import COMPASDataset

# Load the data
data = COMPASDataset()

# Split the data
X_train, X_test, y_train, y_test, s_train, s_test = train_test_split(
    data.X,
    data.y,
    data.sensitive
)

In addition to the usual samples and targets, some classifiers require a mask containing information about sensitive samples as input. This mask can be retrieved with accessing data.sensitive.

Load a custom dataset

For custom datasets, utils comes with a couple of functions to generate sensitive masks.

import pandas as pd
from fairness_aware_classification.utils import sensitive_mask_from_features

# Load the data
df = pd.read_csv("my_dataset.csv")

# Set the target and do some feature selection
y = df.pop("target")
X = df.drop(["useless_feature_1"], axis=1)

# Compute the sensitive samples mask
sensitive_features = ["gender"]
sensitive_values = [0]
sensitive = sensitive_mask_from_features(X, sensitive_features, sensitive_values)

Run a classifier

Classifiers from the module are meant to be used in a scikit-learn fashion. Some functions contained in metrics can be useful to define fairness-oriented objective functions.

from sklearn.metrics import accuracy_score
from fairness_aware_classification.metrics import dfpr_score, dfnr_score
from fairness_aware_classification.classifiers import AdaptiveWeightsClassifier

# The criterion function `objective` should be customized
# depending on the data. It should be maximized.
def objective(y_true, y_pred, sensitive):
    acc = accuracy_score(y_true, y_pred)
    dfpr = dfpr_score(y_true, y_pred, sensitive)
    dfnr = dfnr_score(y_true, y_pred, sensitive)
    
    return 2 * acc - abs(dfpr) - abs(dfnr)

base_clf = LogisticRegression(solver="liblinear")
awc = AdaptiveWeightsClassifier(base_clf, objective)
awc.fit(X_train, y_train, s_train)
y_pred = awc.predict(X_test)

For each provided toy dataset, its suggested objective function is accessible with data.objective.

Results

In main.ipynb, the implemented classifiers are compared with a simple original AdaBoost classifier. The results of these runs on the four provided datasets are presented below.

Adult Census Income Bank marketing
COMPAS KDD Census Income