ML Interpretability
A model that predicts a patient will die but cannot explain why is dangerous. A loan denial model that cannot justify its decision may violate regulations. Interpretability is not a luxury — it is a requirement for trust, debugging, regulatory compliance, and scientific discovery. This page covers the mathematics and practice of explaining any ML model's predictions.
Why Interpretability Matters
| Stakeholder | Need | Example |
|---|---|---|
| Data scientist | Debug model, find data issues | "Why does the model predict 0 for this obvious positive?" |
| Business | Trust, actionable insights | "What drives customer churn?" |
| Regulators | Right to explanation (GDPR Article 22) | "Why was this loan denied?" |
| End users | Understanding and trust | "Why was this email marked as spam?" |
| Researchers | Scientific discovery | "What genes are associated with this disease?" |
Taxonomy
| Local (per prediction) | Global (entire model) | |
|---|---|---|
| Model-specific | Decision tree path, linear coefficients | Feature importances (tree-based) |
| Model-agnostic | LIME, SHAP values | SHAP summary, PDP, permutation importance |
SHAP: SHapley Additive exPlanations
The Shapley Value (Game Theory)
From cooperative game theory: given a game with
For player
Worked Example — Shapley Value for 3 Features
Model predicts house price. 3 features: Size, Bedrooms, Location.E[f(X)] = $300k (base value), f(x) = $450k (prediction for this house).Need to attribute the $150k difference among 3 features.
Value function v(S) for all coalitions (from model): v({}) = 300k (base) v({Size}) = 380k v({Bed}) = 320k v({Loc}) = 340k v({Size,Bed}) = 400k v({Size,Loc}) = 420k v({Bed,Loc}) = 360k v({Size,Bed,Loc}) = 450k
Compute phi_Size (Shapley value for Size): S={}: weight = 0!2!/3! = 12/6 = 1/3. Marginal = v({Size})-v({}) = 380-300 = 80 S={Bed}: weight = 1!*1!/3! = 1/6. Marginal = v({Size,Bed})-v({Bed}) = 400-320 = 80 S={Loc}: weight = 1!*1!/3! = 1/6. Marginal = v({Size,Loc})-v({Loc}) = 420-340 = 80 S={Bed,Loc}: weight = 2!*0!/3! = 2/6 = 1/3. Marginal = v({all})-v({Bed,Loc}) = 450-360 = 90
phi_Size = (1/3)(80) + (1/6)(80) + (1/6)(80) + (1/3)(90) = 26.67 + 13.33 + 13.33 + 30.0 = 83.33k
Similarly: phi_Bed = 26.67k, phi_Loc = 40.0k
Verify efficiency: 83.33 + 26.67 + 40.0 = 150.0k = f(x) - E[f(X)]
Interpret: "Size contributes $83.33k to this house being above average, Location adds $40k, and Bedrooms add $26.67k. The contributions sum exactly to the $150k difference from the base prediction — this is the efficiency property of Shapley values."
where:
= set of all players = any subset not containing = value of coalition = marginal contribution of to coalition
The sum is over all
SHAP for ML
In ML, "players" are features and the "game value" is the model prediction. For a single prediction
where
The value function uses conditional expectations:
Properties (Why Shapley Is Unique)
| Property | Meaning |
|---|---|
| Efficiency | |
| Symmetry | Equal contributors get equal attribution |
| Dummy | Irrelevant features get |
| Additivity | For ensemble: |
Naive SHAP Computation
import numpy as np
from itertools import combinations
def shapley_values_exact(model, x, X_background, feature_names=None):
"""Compute exact Shapley values (exponential — for illustration only)."""
n_features = len(x)
shapley_vals = np.zeros(n_features)
n = len(X_background)
for i in range(n_features):
# Sum over all subsets S not containing i
other_features = [j for j in range(n_features) if j != i]
total = 0.0
count = 0
for size in range(len(other_features) + 1):
for S in combinations(other_features, size):
S = list(S)
weight = (np.math.factorial(len(S)) *
np.math.factorial(n_features - len(S) - 1) /
np.math.factorial(n_features))
# v(S ∪ {i}): features in S ∪ {i} take values from x,
# others are marginalized (averaged over background)
features_in = S + [i]
features_out = [j for j in range(n_features)
if j not in features_in]
# v(S ∪ {i})
X_eval_with = X_background.copy()
for j in features_in:
X_eval_with[:, j] = x[j]
v_with = model.predict(X_eval_with).mean()
# v(S)
X_eval_without = X_background.copy()
for j in S:
X_eval_without[:, j] = x[j]
v_without = model.predict(X_eval_without).mean()
shapley_vals[i] += weight * (v_with - v_without)
return shapley_vals
# Demo on small example (exact computation — slow)
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import load_iris
iris = load_iris()
X_small = iris.data[:50, :3] # 3 features only
y_small = iris.target[:50].astype(float)
rf_small = RandomForestRegressor(n_estimators=10, random_state=42)
rf_small.fit(X_small, y_small)
# Compute Shapley values for first sample
x_explain = X_small[0]
background = X_small[10:30] # 20 background samples
shap_vals = shapley_values_exact(rf_small, x_explain, background)
print("Exact Shapley values:", shap_vals)
print(f"Base value (E[f]): {rf_small.predict(background).mean():.4f}")
print(f"Prediction f(x): {rf_small.predict(x_explain.reshape(1, -1))[0]:.4f}")
print(f"Sum check: {rf_small.predict(background).mean() + shap_vals.sum():.4f}")TreeSHAP (Fast, Exact for Trees)
TreeSHAP (Lundberg et al., 2020) computes exact Shapley values in
import shap
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import fetch_california_housing
import matplotlib.pyplot as plt
# ---- Load California Housing ----
housing = fetch_california_housing()
X, y = housing.data, housing.target
feature_names = housing.feature_names
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Train model
gbm = GradientBoostingRegressor(
n_estimators=200, max_depth=5, learning_rate=0.1, random_state=42
)
gbm.fit(X_train, y_train)
print(f"Test R²: {gbm.score(X_test, y_test):.4f}")
# ---- TreeSHAP ----
explainer = shap.TreeExplainer(gbm)
shap_values = explainer.shap_values(X_test)
print(f"\nSHAP values shape: {shap_values.shape}")
print(f"Base value: {explainer.expected_value:.4f}")
# Verify efficiency property
sample_idx = 0
pred = gbm.predict(X_test[sample_idx:sample_idx+1])[0]
shap_sum = explainer.expected_value + shap_values[sample_idx].sum()
print(f"\nEfficiency check (sample 0):")
print(f" Prediction: {pred:.4f}")
print(f" Base + sum(SHAP): {shap_sum:.4f}")
print(f" Match: {np.isclose(pred, shap_sum)}")SHAP Visualizations
# 1. Summary plot (global importance + direction)
plt.figure(figsize=(10, 6))
shap.summary_plot(shap_values, X_test, feature_names=feature_names,
show=False)
plt.tight_layout()
plt.savefig('shap_summary.png', dpi=150, bbox_inches='tight')
plt.show()
# 2. Bar plot (mean absolute SHAP = global importance)
plt.figure(figsize=(10, 5))
shap.summary_plot(shap_values, X_test, feature_names=feature_names,
plot_type='bar', show=False)
plt.tight_layout()
plt.savefig('shap_bar.png', dpi=150, bbox_inches='tight')
plt.show()
# 3. Waterfall plot (single prediction explanation)
shap.plots.waterfall(shap.Explanation(
values=shap_values[0],
base_values=explainer.expected_value,
data=X_test[0],
feature_names=feature_names
), show=False)
plt.tight_layout()
plt.savefig('shap_waterfall.png', dpi=150, bbox_inches='tight')
plt.show()
# 4. Dependence plot (feature effect + interaction)
shap.dependence_plot('MedInc', shap_values, X_test,
feature_names=feature_names, show=False)
plt.savefig('shap_dependence.png', dpi=150, bbox_inches='tight')
plt.show()
# 5. Force plot (single prediction)
shap.force_plot(
explainer.expected_value, shap_values[0],
X_test[0], feature_names=feature_names,
matplotlib=True, show=False
)
plt.tight_layout()
plt.savefig('shap_force.png', dpi=150, bbox_inches='tight')
plt.show()KernelSHAP (Model-Agnostic)
For non-tree models, KernelSHAP approximates Shapley values using a weighted linear regression:
where
# KernelSHAP for a neural network or any model
from sklearn.neural_network import MLPRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
# Train a neural network
nn = make_pipeline(
StandardScaler(),
MLPRegressor(hidden_layer_sizes=(64, 32), max_iter=500, random_state=42)
)
nn.fit(X_train, y_train)
print(f"NN Test R²: {nn.score(X_test, y_test):.4f}")
# KernelSHAP (slower than TreeSHAP)
background = shap.sample(X_train, 100) # subsample for speed
kernel_explainer = shap.KernelExplainer(nn.predict, background)
kernel_shap_values = kernel_explainer.shap_values(X_test[:50])
plt.figure(figsize=(10, 6))
shap.summary_plot(kernel_shap_values, X_test[:50],
feature_names=feature_names, show=False)
plt.title('KernelSHAP — Neural Network')
plt.tight_layout()
plt.savefig('kernel_shap.png', dpi=150, bbox_inches='tight')
plt.show()LIME: Local Interpretable Model-Agnostic Explanations
How LIME Works
- Perturb the instance: generate neighbors by randomly modifying features
- Predict on perturbed instances using the black-box model
- Weight perturbed instances by proximity to original (using a kernel)
- Fit a simple interpretable model (linear regression, decision tree) on weighted instances
- Read the interpretable model's coefficients as explanations
where
import lime
import lime.lime_tabular
# Create LIME explainer
lime_explainer = lime.lime_tabular.LimeTabularExplainer(
X_train,
feature_names=feature_names,
mode='regression',
random_state=42
)
# Explain a single prediction
idx = 0
exp = lime_explainer.explain_instance(
X_test[idx],
gbm.predict,
num_features=8,
num_samples=5000
)
print(f"Prediction: {gbm.predict(X_test[idx:idx+1])[0]:.4f}")
print(f"\nLIME Explanation (feature contributions):")
for feature, weight in exp.as_list():
print(f" {feature}: {weight:+.4f}")
# Save visualization
fig = exp.as_pyplot_figure()
fig.set_size_inches(10, 6)
plt.tight_layout()
plt.savefig('lime_explanation.png', dpi=150, bbox_inches='tight')
plt.show()LIME vs SHAP
| Aspect | LIME | SHAP |
|---|---|---|
| Theory | Local surrogate model | Cooperative game theory |
| Consistency | Can give different results each run | Deterministic (TreeSHAP) |
| Additivity | Explanations may not sum to prediction | Always sums to prediction |
| Speed | Fast per instance | TreeSHAP fast; KernelSHAP slow |
| Global insights | Not directly (aggregate local) | Summary plots, dependence |
Partial Dependence Plots (PDP)
PDP shows the marginal effect of a feature on predictions, averaging out all other features:
where
from sklearn.inspection import PartialDependenceDisplay
fig, axes = plt.subplots(2, 4, figsize=(20, 10))
features_to_plot = list(range(8))
PartialDependenceDisplay.from_estimator(
gbm, X_test, features_to_plot,
feature_names=feature_names,
ax=axes, grid_resolution=50
)
plt.suptitle('Partial Dependence Plots — California Housing', fontsize=14)
plt.tight_layout()
plt.savefig('pdp_all_features.png', dpi=150, bbox_inches='tight')
plt.show()2D Partial Dependence
fig, ax = plt.subplots(figsize=(10, 8))
PartialDependenceDisplay.from_estimator(
gbm, X_test, [(0, 5)], # MedInc vs AveOccup
feature_names=feature_names,
ax=ax, kind='average'
)
plt.title('2D PDP: Median Income vs Average Occupancy')
plt.tight_layout()
plt.savefig('pdp_2d.png', dpi=150, bbox_inches='tight')
plt.show()Individual Conditional Expectation (ICE)
ICE shows the effect of a feature for each individual sample, revealing heterogeneous effects that PDP (the average) hides.
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
# PDP only
PartialDependenceDisplay.from_estimator(
gbm, X_test[:200], [0],
feature_names=feature_names,
ax=axes[0], kind='average'
)
axes[0].set_title('PDP Only — MedInc')
# ICE + PDP
PartialDependenceDisplay.from_estimator(
gbm, X_test[:200], [0],
feature_names=feature_names,
ax=axes[1], kind='both',
ice_lines_kw={'color': 'blue', 'alpha': 0.05},
pd_line_kw={'color': 'red', 'linewidth': 3}
)
axes[1].set_title('ICE + PDP — MedInc')
plt.tight_layout()
plt.savefig('ice_vs_pdp.png', dpi=150, bbox_inches='tight')
plt.show()PDP Limitations
PDP assumes features are independent. If features are correlated (e.g., rooms and square footage), PDP may show predictions for impossible feature combinations. In such cases, prefer SHAP dependence plots or use Accumulated Local Effects (ALE) plots.
Putting It All Together: California Housing
# ---- Comprehensive interpretability report ----
print("=" * 60)
print("CALIFORNIA HOUSING — INTERPRETABILITY REPORT")
print("=" * 60)
# 1. Global feature importance (permutation)
from sklearn.inspection import permutation_importance
perm_imp = permutation_importance(gbm, X_test, y_test,
n_repeats=30, random_state=42)
print("\n1. PERMUTATION IMPORTANCE (global):")
for idx in np.argsort(perm_imp.importances_mean)[::-1]:
print(f" {feature_names[idx]:>12}: "
f"{perm_imp.importances_mean[idx]:.4f} +/- "
f"{perm_imp.importances_std[idx]:.4f}")
# 2. SHAP global importance
print("\n2. MEAN |SHAP| (global):")
mean_shap = np.abs(shap_values).mean(axis=0)
for idx in np.argsort(mean_shap)[::-1]:
print(f" {feature_names[idx]:>12}: {mean_shap[idx]:.4f}")
# 3. Individual prediction explanation
print("\n3. INDIVIDUAL PREDICTION (sample 0):")
print(f" Prediction: ${gbm.predict(X_test[0:1])[0] * 100000:.0f}")
print(f" Base value: ${explainer.expected_value * 100000:.0f}")
print(f" Feature contributions:")
for idx in np.argsort(np.abs(shap_values[0]))[::-1]:
val = shap_values[0][idx]
sign = "+" if val > 0 else ""
print(f" {feature_names[idx]:>12} = {X_test[0][idx]:>8.2f} "
f"→ {sign}{val * 100000:.0f}")Method Selection Guide
Key Takeaways
| Concept | Remember |
|---|---|
| Shapley values are the unique fair attribution | Efficiency: contributions sum to prediction |
| TreeSHAP is fast and exact for trees | Polynomial time via tree structure exploitation |
| KernelSHAP works for any model | Approximation via weighted linear regression |
| LIME fits a local interpretable surrogate | Fast but can be inconsistent across runs |
| PDP shows marginal feature effects | Assumes feature independence (can be misleading) |
| ICE reveals heterogeneous effects | Individual curves that PDP averages over |
| Always use multiple methods | No single method tells the complete story |
| SHAP summary plot = best global overview | Shows importance AND direction AND distribution |