Model Explainability with SHAP in Python — Core Concepts
What Is SHAP?
SHAP (SHapley Additive exPlanations) is a method for explaining individual predictions by assigning each feature a contribution value. It is rooted in cooperative game theory — specifically Shapley values, a concept from the 1950s that fairly distributes a “payout” among players based on their contributions.
In ML terms: the “payout” is the prediction, and the “players” are the input features.
How Shapley Values Work
Imagine a model predicts house prices using three features: square footage, number of bedrooms, and neighborhood. For a specific house predicted at $450,000 (while the average prediction is $300,000), SHAP asks: how much did each feature contribute to the $150,000 difference from average?
SHAP calculates this by considering every possible combination of features and measuring what each feature adds to the prediction when included. The result might be:
- Square footage: +$80,000
- Neighborhood: +$55,000
- Bedrooms: +$15,000
These contributions sum to the $150,000 difference from the average. This additive property is one of SHAP’s mathematical guarantees.
Local vs Global Explanations
Local Explanations
Explain a single prediction. “Why was this specific customer flagged as high churn risk?”
Global Explanations
Aggregate SHAP values across many predictions to understand overall feature importance. “Which features matter most across all customers?”
Both come from the same SHAP values — local explanations are per-prediction, global explanations are statistics over many predictions.
SHAP Explainer Types
| Explainer | Best For | Speed |
|---|---|---|
| TreeExplainer | Tree-based models (XGBoost, LightGBM, Random Forest) | Fast |
| LinearExplainer | Linear models (logistic regression, linear SVM) | Fast |
| DeepExplainer | Deep learning models (PyTorch, TensorFlow) | Medium |
| KernelExplainer | Any model (model-agnostic) | Slow |
TreeExplainer is by far the most commonly used because tree-based models dominate tabular ML tasks and it runs in polynomial time rather than the exponential time of exact Shapley computation.
Reading a SHAP Plot
The most common visualization is the beeswarm plot. Each dot is one prediction. Features are listed top to bottom by importance. Dots to the right mean the feature pushed the prediction higher; dots to the left mean it pushed lower. Color indicates the feature’s actual value (red = high, blue = low).
This immediately reveals patterns like “high values of feature X always push predictions up” or “feature Y has high importance but mixed direction.”
When Explanations Matter
- Regulated industries — the EU AI Act and US fair lending laws require explanations for certain automated decisions
- Debugging — SHAP reveals when a model relies on spurious correlations (like using zip code as a proxy for race)
- Stakeholder trust — business stakeholders adopt models faster when they understand the reasoning
- Feature engineering — features with near-zero SHAP values can be safely removed
Common Misconception
SHAP values explain how a specific model makes decisions — they do not explain causal relationships in the real world. If a model uses “ice cream sales” to predict “drowning incidents,” SHAP will correctly show that ice cream sales is an important feature. It will not tell you that both are caused by hot weather. SHAP explains the model, not reality.
One thing to remember: SHAP gives every feature a fair, mathematically grounded credit score for each prediction — making model decisions transparent and auditable.
See Also
- Python Ab Testing Ml Models Why taste-testing two cookie recipes with different friends is the fairest way to pick a winner.
- Python Feature Store Design Why a shared ingredient pantry saves every cook in the kitchen from buying the same spices over and over.
- Python Ml Pipeline Orchestration Why a factory assembly line needs a foreman to make sure every step happens in the right order at the right time.
- Python Mlflow Experiment Tracking Find out why writing down every cooking experiment helps you recreate the perfect recipe every time.
- Python Model Monitoring Drift Why a weather forecast that was perfect last summer might completely fail this winter — and how to catch it early.