Fair Preprocessing: Towards Understanding Compositional Fairness of Data Transformers in Machine Learning Pipeline

Nancy J. Delong

Machine understanding (ML) can take about a great deal of guide tasks that we do, this kind of as recruitment, offering monetary products and solutions, and many others. However, from time to time ML algorithms could acquire biases centered on age, gender, race, and many others. This unfair discrimination is a important limitation of working with these Machine Studying Algorithms for important tasks. It also impacts their acceptance and adaptation. A great deal of instances, this discrimination is brought on by the instruction knowledge.

Machine learning - artistic interpretation.

Machine understanding – artistic interpretation. Image credit score: chenspec by using Pixabay, absolutely free licence

The unfairness brought on by knowledge transformation in the knowledge preprocessing phase has been mentioned by Sumon Biswas and Hridesh Rajan in their investigation paper titled “Fair Preprocessing: Towards Being familiar with Compositional Fairness of Information Transformers in Machine Studying Pipeline” which sorts the basis of the adhering to textual content

Worth of this investigation

This investigation paper discusses the fairness affect of knowledge preprocessing levels in the ML pipeline. Pinpointing the root induce of unfair discrimination in ML algorithms would be the 1st phase to eradicating these biases. Building these algorithms bias-absolutely free would raise their acceptance and make our globe more quickly, much more economical, and with out discrimination. 

The aim

In the words of the researchers,

one) We produced a fairness benchmark of ML pipelines with many levels. The benchmark, code and results are shared in our replication package1 in GitHub repository, that can be leveraged in further investigation on setting up honest ML pipeline. (2) We launched the notion of causality in ML pipeline and leveraged present metrics to measure the fairness of preprocessing levels in ML pipeline. (3) Unfairness patterns have been determined for a range of levels. (4) We determined choice knowledge transformers which can mitigate bias in the pipeline. (5) Lastly, we confirmed the composition of phase-distinct fairness into overall fairness, which is utilised to pick correct downstream transformer that mitigates bias.

The investigation strategy

As the authors describe,

In this paper, we launched the causal strategy of fairness to reason about the fairness affect of knowledge preprocessing levels in ML pipeline. We leveraged present metrics to define the fairness measures of the levels. Then we executed a comprehensive fairness evaluation of the preprocessing levels in 37 pipelines collected from 3 distinct sources. Our results display that certain knowledge transformers are causing the product to exhibit unfairness. We determined a range of fairness patterns in many classes of knowledge transformers. Lastly, we confirmed how the area fairness of a preprocessing phase composes in the world-wide fairness of the pipeline. We utilised the fairness composition to pick correct downstream transformer that mitigates unfairness in the equipment understanding pipeline.

How to measure fairness of pre-processing phase in equipment understanding?

  • Let us have a ML pipeline P that involves levels Sone, S2, S3……. Sn
  • Fairness of Stage Sk can be measured as below
    • Allows formulate an additional ML pipeline P’ that involves levels Sone, S2, …. Sk-one, Sk+one, ….. Sn that is like each and every phase other than the phase Sk
    • We will notice the prediction of this ML pipeline P’
    • By observing the prediction change among P and P’ and observing if it is favourable to any group, we can estimate the fairness of the Stage Sk

Image credit score: arXiv:2106.06054 [cs.LG]

Investigation end result

The findings shown below ended up noticed by the researchers:

  • Information filtering and lacking value elimination alter the knowledge distribution and introduce bias in the ML pipeline.
  • New function generation or function transformation can have a considerable affect on fairness.
  • Encoding approaches need to be decided on cautiously centered on the classifier.
  • The fairness of preprocessing levels variability relies upon on the dataset dimensions and overall prediction level of the pipelines.
  • The unfairness of a preprocessing phase can be dominated by the dataset or the classifier utilised in the pipeline.
  • Amid all the transformers, making use of the sampling strategy exhibits the most unfairness.
  • Selecting a subset of features usually improves unfairness.
  • In most pipelines, function standardization and non-linear transformers are honest transformers.


The investigation paper mentioned how the knowledge preprocessing phase impacts the fairness of classification tasks. The investigation shows that lots of knowledge preprocessing levels induce bias in ML algorithms’ prediction. The researchers have demonstrated that fairer ML algorithms could be designed by making certain the fairness of these individual knowledge transformers. 

Long term function

The pipeline benchmark, code, and results are publicly offered for further use. Automated applications could be designed in the long term to detect bias in ML pipeline levels and make improvements to them. 

Resource: Sumon Biswas and Hridesh Rajan, “Fair Preprocessing: Towards Being familiar with Compositional Fairness of Information Transformers in Machine Studying Pipeline”

Next Post

Bridging the Gap: Using Deep Acoustic Representations to Learn Grounded Language from Percepts and Raw Speech

Mastering to realize grounded language—the language that happens in the context of, and refers to, the broader world—is a common place of exploration in robotics. The vast majority of present get the job done in this place however operates on textual data, and that boundaries the ability to deploy agents […]