AI Ethics — Core Concepts

The Multiple Meanings of Fairness

One of the most important and counterintuitive results in algorithmic fairness: different mathematical definitions of fairness are mutually exclusive. You literally cannot satisfy all of them at once (except in trivial cases).

Demographic parity: The positive outcome rate should be equal across groups. If 30% of white applicants are hired, 30% of Black applicants should be hired too.

Equal opportunity: The true positive rate should be equal across groups. If a qualified white applicant has a 90% chance of being hired, a qualified Black applicant should also have a 90% chance.

Calibration: Across all people who receive a score of 0.8 (80% probability of a positive outcome), 80% should actually have a positive outcome — regardless of group membership.

In 2016, journalists at ProPublica showed that COMPAS — a recidivism prediction tool used in US courts — had different false positive rates for Black and white defendants. Black defendants who didn’t reoffend were labeled high-risk twice as often as white defendants who didn’t reoffend.

Northpointe, the COMPAS developer, responded that their system was calibrated — for any given risk score, the actual recidivism rate was the same across racial groups. They were both right. Chouldechova (2017) proved that demographic parity, equal opportunity, and calibration are mathematically incompatible when base rates differ across groups. In the real world, historical crime statistics show different base rates — so you must choose which fairness criterion to prioritize.

There is no technical solution to this tension. It’s a values question — a choice about which kind of fairness matters most — dressed up in mathematics.

Sources of Algorithmic Bias

Bias enters AI systems at multiple points in the pipeline:

Historical bias: Training data reflects past human decisions that were themselves biased. Amazon’s hiring tool (2018), Google Photos mislabeling Black people as “gorillas” (2015), facial recognition systems with 10–34% higher error rates for darker-skinned women (Buolamwini & Gebru, 2018 “Gender Shades” paper).

Representation bias: Certain groups are underrepresented in training data. Medical AI trained primarily on data from research hospitals (disproportionately serving affluent populations) may perform worse for underrepresented groups.

Measurement bias: The proxy metric used for training doesn’t equally represent the outcome of interest for all groups. Clinical risk scores often use healthcare utilization as a proxy for health needs — but lower-income patients use healthcare less due to access barriers, causing systematic underestimation of their actual health needs.

Feedback loops: Deploying a biased model can create new biased training data. Predictive policing systems that send more police to certain neighborhoods generate more arrests in those neighborhoods, reinforcing the original bias.

Explainability: The Accuracy-Interpretability Tradeoff

There’s a persistent tension between model accuracy and interpretability. The most accurate models (deep neural networks with billions of parameters) are the hardest to explain. The most interpretable models (decision trees, linear regression) are often less accurate.

Methods for explaining black-box models:

LIME (Local Interpretable Model-agnostic Explanations): Create an interpretable local approximation around a specific prediction. Perturb the input slightly and observe how the prediction changes. Fit a simple model (linear, sparse) to these local perturbations. The simple model’s features are the “explanation.”

SHAP (SHapley Additive exPlanations): Uses game theory (Shapley values) to assign each feature a fair contribution to the prediction. Shapley values are the only attribution method satisfying all of: efficiency, symmetry, dummy, and additivity. SHAP is consistent (when a model depends more on a feature, that feature’s SHAP value doesn’t decrease) and can aggregate across many instances for global explanations.

Attention visualization: For transformer models, visualizing attention weights sometimes reveals which parts of the input were attended to. However, Jain & Wallace (2019) showed attention doesn’t reliably indicate importance — high attention weight doesn’t imply high causal influence on the output.

Mechanistic explanations: For critical decisions, circuit-level analysis (identifying which neurons and weights contribute to a specific output) is more rigorous but very expensive.

Accountability Structures

When an AI system harms someone, current legal frameworks are poorly equipped to assign accountability. Key questions:

  • Is the AI company liable for a model that behaves as documented?
  • Is the deploying organization liable for choosing to use a model they knew was imperfect?
  • Can the developer’s Section 230-like protections (common carrier arguments) shield them from liability?

Current regulatory approaches:

EU AI Act (effective 2024–2026 phased): Classifies AI by risk level. High-risk AI (hiring, credit, healthcare, law enforcement) must undergo conformity assessments, maintain technical documentation, implement human oversight mechanisms, and allow post-market monitoring. Prohibited uses include social scoring and real-time biometric surveillance in public spaces.

US approach: More fragmented. The 2023 Executive Order on Safe, Secure, and Trustworthy AI required NIST to develop AI safety standards, federal agencies to designate AI safety officers, and developers of powerful models to report safety findings. Sector-specific regulation (FDA for medical AI, EEOC for hiring AI) is developing.

Algorithmic Accountability Act: Proposed US legislation (not yet passed as of 2024) that would require companies to conduct impact assessments for automated systems affecting consequential decisions.

The Labor and Environmental Dimensions

AI ethics extends beyond fairness and explainability to include:

Data labor: AI systems are trained on data created by humans — books, art, code, images. The creators of this data rarely consented to or are compensated for their contributions. Legal battles over copyright (e.g., Getty Images vs. Stability AI, NYT vs. OpenAI) are testing whether training data use constitutes infringement.

Content moderation labor: The humans who review and label disturbing content to train content moderation systems and RLHF models often work for low wages with inadequate psychological support. TIME’s 2023 investigation documented OpenAI contractors in Kenya being paid <$2/hour to review traumatic content.

Environmental impact: Training large AI models has significant carbon footprint. Training GPT-3 was estimated at ~500 tonnes of CO2 equivalent. At scale, inference is probably a larger impact than training. The energy consumption of large AI data centers is increasingly scrutinized.

One thing to remember: AI ethics isn’t separate from AI development — the choices made about training data, optimization objectives, deployment contexts, and evaluation metrics are all ethical choices, whether or not they’re framed that way.

ai-ethicsalgorithmic-fairnessbiasexplainabilityaccountability

See Also

  • Ai Safety Why some of the world's smartest people are worried about AI — and what researchers are actually doing about it before it becomes a problem.
  • Prompt Injection The security vulnerability where AI assistants can be hijacked by hidden instructions in documents they read — and why it's becoming a serious security problem.
  • Reward Modeling How AI learns what 'good' means — the training component that translates human preferences into a mathematical score that AI systems can optimize for.
  • Rlhf How ChatGPT learned to be helpful instead of just clever — the feedback loop that turned raw AI into something you'd actually want to talk to.
  • Activation Functions Why neural networks need these tiny mathematical functions — and how ReLU's simplicity accidentally made deep learning possible.