Ethical AI Fairness in Python — Core Concepts

Understanding bias types, fairness metrics, and mitigation strategies for machine learning models — from disparate impact analysis to equalized odds and calibration

Where bias comes from

Bias in AI isn’t introduced by the algorithm — it’s inherited from the data and design choices.

Historical bias: The training data reflects past decisions that were themselves biased. A hiring dataset from a company that historically favored certain demographics will train a model that continues that pattern.

Representation bias: The training data doesn’t represent the population the model will serve. A facial recognition system trained mostly on lighter-skinned faces performs worse on darker-skinned faces — not because the algorithm is biased, but because it had less data to learn from.

Measurement bias: The features used as proxies for what you’re trying to predict are themselves biased. Using zip code as a predictor for creditworthiness can serve as a proxy for race due to historical housing segregation.

Aggregation bias: A single model is used for populations that behave differently. A medical model trained on adult data may make poor predictions for children.

Label bias: The labels in the training data are biased. If historical loan default labels are influenced by discriminatory lending practices (fewer opportunities for certain groups), the model learns to replicate that discrimination.

Fairness metrics

There are multiple mathematical definitions of fairness, and they measure different things:

Demographic parity (statistical parity): The model should approve (positive prediction) at equal rates across groups. If 60% of Group A gets approved, roughly 60% of Group B should too. This metric ignores qualification differences between groups.

Equalized odds: The model should have equal true positive rates and equal false positive rates across groups. If Group A’s qualified applicants are approved 80% of the time, Group B’s qualified applicants should also be approved 80% of the time. This accounts for qualification differences.

Predictive parity: Among those the model predicts as positive, the actual positive rate should be equal across groups. If the model approves 100 people from each group, a similar proportion should actually repay their loans.

Individual fairness: Similar individuals should receive similar predictions. This doesn’t compare groups — it says that two people with similar qualifications should get similar outcomes regardless of which group they belong to.

Calibration: The model’s confidence should match reality equally across groups. If the model says someone has a 70% chance of repaying a loan, roughly 70% of such predictions should be correct, regardless of group membership.

A critical insight: these metrics can be mathematically incompatible. A model satisfying demographic parity can violate equalized odds, and vice versa. Choosing which fairness metric to optimize is a values decision, not a technical one.

Mitigation strategies

Bias mitigation happens at three stages:

Pre-processing (fix the data): Reweight or resample the training data to reduce bias. Remove or transform features that serve as proxies for protected attributes. Generate synthetic examples to balance underrepresented groups.

In-processing (fix the training): Add fairness constraints to the model’s objective function. The model optimizes for both accuracy and fairness simultaneously. This trades some overall accuracy for more equitable outcomes.

Post-processing (fix the predictions): Adjust the model’s decision thresholds per group to equalize a chosen fairness metric. A model might use a threshold of 0.5 for Group A and 0.45 for Group B to achieve equalized odds. This is the least invasive but can feel uncomfortable because it explicitly treats groups differently.

Protected attributes and proxy variables

Laws typically prohibit discrimination based on race, gender, age, religion, disability, and national origin. These are protected attributes. But even if you remove these features from your model, the model may learn to use proxy variables — zip code as a proxy for race, name patterns as a proxy for ethnicity, height as a proxy for gender.

Detecting proxy effects requires measuring model outcomes across protected groups even when the model doesn’t directly use protected attributes. This is called disparate impact analysis: regardless of what features the model uses, do the outcomes disproportionately affect certain groups?

The accuracy-fairness tradeoff

Making a model fairer often reduces overall accuracy. If a model’s most predictive feature is also a proxy for race, removing or downweighting it hurts predictions. This tradeoff is real but often smaller than expected — in many cases, the accuracy drop is 1-3% while the fairness improvement is substantial.

The tradeoff also depends on how you measure accuracy. A model that’s 90% accurate overall but 95% accurate for one group and 75% for another is less useful than a model that’s 87% accurate for everyone.

Common misconception: removing protected attributes eliminates bias

Deleting gender or race columns from training data doesn’t remove bias. The model learns correlations from remaining features that correlate with protected attributes. This is called fairness through unawareness, and it’s widely recognized as insufficient. True fairness requires measuring outcomes across groups, not just hiding group labels.

The one thing to remember: AI fairness requires choosing from mathematically incompatible definitions of fairness, measuring outcomes across protected groups even when the model doesn’t use those attributes directly, and accepting a measured tradeoff between overall accuracy and equitable treatment.

pythonaifairnessethics