Ethical AI Fairness in Python — Deep Dive
Measuring fairness with Fairlearn
Fairlearn is Microsoft’s open-source toolkit for assessing and improving AI fairness. It provides metrics, visualizations, and mitigation algorithms.
# pip install fairlearn scikit-learn
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from fairlearn.metrics import (
MetricFrame,
demographic_parity_difference,
equalized_odds_difference,
selection_rate,
true_positive_rate,
false_positive_rate,
)
# Simulated lending data
np.random.seed(42)
n = 5000
X = np.random.randn(n, 10)
sensitive_feature = np.random.choice(["group_a", "group_b"], n, p=[0.6, 0.4])
# Inject bias: group_b has a slightly lower approval rate in historical data
y = (X[:, 0] + X[:, 1] + (sensitive_feature == "group_a") * 0.5 > 0).astype(int)
X_train, X_test, y_train, y_test, sf_train, sf_test = train_test_split(
X, y, sensitive_feature, test_size=0.3, random_state=42
)
# Train a model (without using sensitive feature directly)
model = GradientBoostingClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Compute fairness metrics
metric_frame = MetricFrame(
metrics={
"selection_rate": selection_rate,
"true_positive_rate": true_positive_rate,
"false_positive_rate": false_positive_rate,
"accuracy": lambda y_true, y_pred: (y_true == y_pred).mean(),
},
y_true=y_test,
y_pred=y_pred,
sensitive_features=sf_test,
)
print("Metrics by group:")
print(metric_frame.by_group.to_string())
print(f"\nDemographic parity difference: "
f"{demographic_parity_difference(y_test, y_pred, sensitive_features=sf_test):.4f}")
print(f"Equalized odds difference: "
f"{equalized_odds_difference(y_test, y_pred, sensitive_features=sf_test):.4f}")
A demographic parity difference of 0 means perfect parity; values above 0.1 typically indicate concern. The 80% rule (from US employment law) flags disparate impact when the selection rate for a disadvantaged group is less than 80% of the advantaged group’s rate.
Bias mitigation with Fairlearn
Post-processing: ThresholdOptimizer
The simplest mitigation adjusts decision thresholds per group:
from fairlearn.postprocessing import ThresholdOptimizer
# Optimize thresholds to satisfy equalized odds
mitigated = ThresholdOptimizer(
estimator=model,
constraints="equalized_odds",
objective="balanced_accuracy_score",
prefit=True,
)
mitigated.fit(X_train, y_train, sensitive_features=sf_train)
y_pred_fair = mitigated.predict(X_test, sensitive_features=sf_test)
# Compare before and after
print("Before mitigation:")
print(f" Demographic parity diff: "
f"{demographic_parity_difference(y_test, y_pred, sensitive_features=sf_test):.4f}")
print(f" Accuracy: {(y_test == y_pred).mean():.4f}")
print("After mitigation:")
print(f" Demographic parity diff: "
f"{demographic_parity_difference(y_test, y_pred_fair, sensitive_features=sf_test):.4f}")
print(f" Accuracy: {(y_test == y_pred_fair).mean():.4f}")
In-processing: ExponentiatedGradient
For stronger guarantees, constrain the training process itself:
from fairlearn.reductions import ExponentiatedGradient, DemographicParity
from sklearn.linear_model import LogisticRegression
constraint = DemographicParity()
base_estimator = LogisticRegression(max_iter=1000)
mitigated_model = ExponentiatedGradient(
estimator=base_estimator,
constraints=constraint,
max_iter=50,
)
mitigated_model.fit(X_train, y_train, sensitive_features=sf_train)
y_pred_constrained = mitigated_model.predict(X_test)
# This model was trained with fairness constraints built in
metric_frame_constrained = MetricFrame(
metrics={"selection_rate": selection_rate, "accuracy": lambda y, p: (y == p).mean()},
y_true=y_test,
y_pred=y_pred_constrained,
sensitive_features=sf_test,
)
print(metric_frame_constrained.by_group.to_string())
IBM AIF360 for comprehensive bias analysis
AIF360 provides a broader set of fairness metrics and pre-processing algorithms:
# pip install aif360
from aif360.datasets import BinaryLabelDataset
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric
from aif360.algorithms.preprocessing import Reweighing
import pandas as pd
# Create AIF360 dataset
df = pd.DataFrame(X_test, columns=[f"feat_{i}" for i in range(10)])
df["label"] = y_test
df["group"] = (sf_test == "group_a").astype(int)
dataset = BinaryLabelDataset(
df=df,
label_names=["label"],
protected_attribute_names=["group"],
favorable_label=1,
unfavorable_label=0,
)
# Measure dataset bias before any model
metric = BinaryLabelDatasetMetric(
dataset,
unprivileged_groups=[{"group": 0}],
privileged_groups=[{"group": 1}],
)
print(f"Disparate impact: {metric.disparate_impact():.4f}")
print(f"Statistical parity difference: {metric.statistical_parity_difference():.4f}")
print(f"Consistency: {metric.consistency()[0]:.4f}")
# Pre-processing: Reweighing
reweigher = Reweighing(
unprivileged_groups=[{"group": 0}],
privileged_groups=[{"group": 1}],
)
# Create training dataset in AIF360 format
df_train = pd.DataFrame(X_train, columns=[f"feat_{i}" for i in range(10)])
df_train["label"] = y_train
df_train["group"] = (sf_train == "group_a").astype(int)
train_dataset = BinaryLabelDataset(
df=df_train,
label_names=["label"],
protected_attribute_names=["group"],
favorable_label=1,
unfavorable_label=0,
)
reweighed = reweigher.fit_transform(train_dataset)
# Use reweighed.instance_weights as sample_weight in model.fit()
Intersectional fairness analysis
Bias can compound at intersections of protected attributes. A model might be fair for women overall and fair for Black people overall, but unfair for Black women specifically.
def intersectional_analysis(
y_true: np.ndarray,
y_pred: np.ndarray,
groups: dict[str, np.ndarray],
) -> pd.DataFrame:
"""Analyze fairness at intersections of multiple attributes."""
df = pd.DataFrame({"y_true": y_true, "y_pred": y_pred, **groups})
# Create intersectional groups
group_cols = list(groups.keys())
df["intersection"] = df[group_cols].apply(
lambda row: " × ".join(str(v) for v in row), axis=1
)
results = []
for name, group_df in df.groupby("intersection"):
tp = ((group_df["y_pred"] == 1) & (group_df["y_true"] == 1)).sum()
fp = ((group_df["y_pred"] == 1) & (group_df["y_true"] == 0)).sum()
fn = ((group_df["y_pred"] == 0) & (group_df["y_true"] == 1)).sum()
tn = ((group_df["y_pred"] == 0) & (group_df["y_true"] == 0)).sum()
n = len(group_df)
sr = (group_df["y_pred"] == 1).mean()
tpr = tp / (tp + fn) if (tp + fn) > 0 else 0
fpr = fp / (fp + tn) if (fp + tn) > 0 else 0
results.append({
"group": name,
"n": n,
"selection_rate": round(sr, 4),
"true_positive_rate": round(tpr, 4),
"false_positive_rate": round(fpr, 4),
})
return pd.DataFrame(results).sort_values("selection_rate")
# Example
analysis = intersectional_analysis(
y_test, y_pred,
groups={
"gender": np.random.choice(["M", "F"], len(y_test)),
"race": np.random.choice(["A", "B", "C"], len(y_test)),
},
)
print(analysis.to_string(index=False))
SHAP for bias explanation
Understanding why a model is biased is as important as detecting it:
# pip install shap
import shap
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
# Compare feature importance between groups
group_a_mask = sf_test == "group_a"
group_b_mask = sf_test == "group_b"
mean_shap_a = np.abs(shap_values[group_a_mask]).mean(axis=0)
mean_shap_b = np.abs(shap_values[group_b_mask]).mean(axis=0)
feature_names = [f"feat_{i}" for i in range(10)]
for i, name in enumerate(feature_names):
diff = abs(mean_shap_a[i] - mean_shap_b[i])
if diff > 0.01: # significant difference
print(f" {name}: Group A importance={mean_shap_a[i]:.4f}, "
f"Group B={mean_shap_b[i]:.4f}, diff={diff:.4f}")
Features with large importance differences between groups are potential sources of bias. If feat_2 matters much more for Group B’s predictions, investigate whether it’s acting as a proxy for group membership.
Automated fairness testing in CI/CD
Integrate fairness checks into your model deployment pipeline:
import pytest
from dataclasses import dataclass
@dataclass
class FairnessThresholds:
max_demographic_parity_diff: float = 0.1
max_equalized_odds_diff: float = 0.1
min_disparate_impact: float = 0.8 # 80% rule
max_selection_rate_ratio: float = 1.25
def test_model_fairness(trained_model, test_data, sensitive_features):
"""Fail CI if model exceeds fairness thresholds."""
thresholds = FairnessThresholds()
y_pred = trained_model.predict(test_data.X)
y_true = test_data.y
dp_diff = demographic_parity_difference(
y_true, y_pred, sensitive_features=sensitive_features
)
assert dp_diff <= thresholds.max_demographic_parity_diff, (
f"Demographic parity difference {dp_diff:.4f} exceeds "
f"threshold {thresholds.max_demographic_parity_diff}"
)
eo_diff = equalized_odds_difference(
y_true, y_pred, sensitive_features=sensitive_features
)
assert eo_diff <= thresholds.max_equalized_odds_diff, (
f"Equalized odds difference {eo_diff:.4f} exceeds "
f"threshold {thresholds.max_equalized_odds_diff}"
)
# Check selection rates per group (80% rule)
mf = MetricFrame(
metrics={"selection_rate": selection_rate},
y_true=y_true,
y_pred=y_pred,
sensitive_features=sensitive_features,
)
rates = mf.by_group["selection_rate"]
ratio = rates.min() / rates.max() if rates.max() > 0 else 0
assert ratio >= thresholds.min_disparate_impact, (
f"Disparate impact ratio {ratio:.4f} below "
f"threshold {thresholds.min_disparate_impact}"
)
def test_intersectional_fairness(trained_model, test_data, group_attributes):
"""Check fairness at intersections of multiple attributes."""
y_pred = trained_model.predict(test_data.X)
analysis = intersectional_analysis(y_true=test_data.y, y_pred=y_pred, groups=group_attributes)
sr_range = analysis["selection_rate"].max() - analysis["selection_rate"].min()
assert sr_range < 0.15, (
f"Intersectional selection rate range {sr_range:.4f} too wide. "
f"Worst group: {analysis.iloc[0]['group']}"
)
Model cards for transparency
Document fairness properties alongside model metadata:
from dataclasses import dataclass, field
@dataclass
class ModelCard:
model_name: str
version: str
task: str
training_data_description: str
# Fairness section
evaluated_groups: list[str] = field(default_factory=list)
fairness_metrics: dict[str, float] = field(default_factory=dict)
known_biases: list[str] = field(default_factory=list)
mitigation_applied: list[str] = field(default_factory=list)
fairness_limitations: list[str] = field(default_factory=list)
# Performance
overall_accuracy: float = 0.0
per_group_accuracy: dict[str, float] = field(default_factory=dict)
def to_markdown(self) -> str:
lines = [
f"# Model Card: {self.model_name} v{self.version}",
f"\n## Task\n{self.task}",
f"\n## Training Data\n{self.training_data_description}",
f"\n## Overall Performance\n- Accuracy: {self.overall_accuracy:.4f}",
"\n## Fairness Evaluation",
f"- Groups evaluated: {', '.join(self.evaluated_groups)}",
]
for metric, value in self.fairness_metrics.items():
lines.append(f"- {metric}: {value:.4f}")
if self.known_biases:
lines.append("\n## Known Biases")
for bias in self.known_biases:
lines.append(f"- {bias}")
if self.mitigation_applied:
lines.append("\n## Mitigation Applied")
for m in self.mitigation_applied:
lines.append(f"- {m}")
if self.fairness_limitations:
lines.append("\n## Limitations")
for l in self.fairness_limitations:
lines.append(f"- {l}")
return "\n".join(lines)
Tradeoffs
Which fairness metric to optimize? Different metrics can be mutually exclusive (proven by the impossibility theorem for three or more groups with different base rates). The choice depends on the application domain and stakeholder values. Lending might prioritize equalized odds; hiring might prioritize demographic parity.
Group fairness vs. individual fairness: Group-level metrics can hide individual unfairness (someone in a “fair” group gets an unfair outcome). Individual fairness requires defining “similarity” between people, which is subjective and domain-specific.
Transparency vs. gaming: Publishing model cards and fairness metrics helps accountability but could allow adversaries to game the system. The consensus favors transparency — the benefits of accountability outweigh the gaming risk.
Static vs. dynamic fairness: Fairness measured at deployment degrades as population distributions shift. Continuous monitoring with automated alerts is necessary, not just pre-deployment testing.
The one thing to remember: Ethical AI fairness in Python requires measuring multiple fairness metrics across protected groups (including intersections), choosing which metric to optimize based on domain values, applying mitigation at the data, training, or prediction level, and integrating automated fairness tests into the CI/CD pipeline.
See Also
- Activation Functions Why neural networks need these tiny mathematical functions — and how ReLU's simplicity accidentally made deep learning possible.
- Ai Agents Architecture How AI systems go from answering questions to actually doing things — the design patterns that turn language models into autonomous agents that browse, code, and plan.
- Ai Agents ChatGPT answers questions. AI agents actually do things — browse the web, write code, send emails, and keep going until the job is done. Here's the difference.
- Ai Ethics Why building AI fairly is harder than it sounds — bias, accountability, privacy, and who gets to decide what AI is allowed to do.
- Ai Hallucinations ChatGPT sometimes makes up facts with total confidence. Here's the weird reason why — and why it's not as simple as 'the AI lied.'