Model Explainability with SHAP in Python — Core Concepts

Use SHAP values to explain individual predictions and global feature importance for any machine learning model.

What Is SHAP?

SHAP (SHapley Additive exPlanations) is a method for explaining individual predictions by assigning each feature a contribution value. It is rooted in cooperative game theory — specifically Shapley values, a concept from the 1950s that fairly distributes a “payout” among players based on their contributions.

In ML terms: the “payout” is the prediction, and the “players” are the input features.

How Shapley Values Work

Imagine a model predicts house prices using three features: square footage, number of bedrooms, and neighborhood. For a specific house predicted at $450,000 (while the average prediction is $300,000), SHAP asks: how much did each feature contribute to the $150,000 difference from average?

SHAP calculates this by considering every possible combination of features and measuring what each feature adds to the prediction when included. The result might be:

Square footage: +$80,000
Neighborhood: +$55,000
Bedrooms: +$15,000

These contributions sum to the $150,000 difference from the average. This additive property is one of SHAP’s mathematical guarantees.

Local vs Global Explanations

Local Explanations

Explain a single prediction. “Why was this specific customer flagged as high churn risk?”

Global Explanations

Aggregate SHAP values across many predictions to understand overall feature importance. “Which features matter most across all customers?”

Both come from the same SHAP values — local explanations are per-prediction, global explanations are statistics over many predictions.

SHAP Explainer Types

Explainer	Best For	Speed
TreeExplainer	Tree-based models (XGBoost, LightGBM, Random Forest)	Fast
LinearExplainer	Linear models (logistic regression, linear SVM)	Fast
DeepExplainer	Deep learning models (PyTorch, TensorFlow)	Medium
KernelExplainer	Any model (model-agnostic)	Slow

TreeExplainer is by far the most commonly used because tree-based models dominate tabular ML tasks and it runs in polynomial time rather than the exponential time of exact Shapley computation.

Reading a SHAP Plot

The most common visualization is the beeswarm plot. Each dot is one prediction. Features are listed top to bottom by importance. Dots to the right mean the feature pushed the prediction higher; dots to the left mean it pushed lower. Color indicates the feature’s actual value (red = high, blue = low).

This immediately reveals patterns like “high values of feature X always push predictions up” or “feature Y has high importance but mixed direction.”

When Explanations Matter

Regulated industries — the EU AI Act and US fair lending laws require explanations for certain automated decisions
Debugging — SHAP reveals when a model relies on spurious correlations (like using zip code as a proxy for race)
Stakeholder trust — business stakeholders adopt models faster when they understand the reasoning
Feature engineering — features with near-zero SHAP values can be safely removed

Common Misconception

SHAP values explain how a specific model makes decisions — they do not explain causal relationships in the real world. If a model uses “ice cream sales” to predict “drowning incidents,” SHAP will correctly show that ice cream sales is an important feature. It will not tell you that both are caused by hot weather. SHAP explains the model, not reality.

One thing to remember: SHAP gives every feature a fair, mathematically grounded credit score for each prediction — making model decisions transparent and auditable.

pythonshapexplainabilitymachine-learning