Artificial Intelligence 12 min read

Unlocking Interpretable Machine Learning: From Linear Regression to EBM

This article surveys intrinsic interpretable machine‑learning models—from classic regression, additive models, and decision trees to modern approaches like Explainable Boosting Machines, GAMINet, RuleFit, and Falling Rule Lists—explaining their principles, parameter estimation, interpretability, advantages, and limitations.

Model Perspective
Model Perspective
Model Perspective
Unlocking Interpretable Machine Learning: From Linear Regression to EBM

Intrinsic interpretable machine‑learning models are those that are inherently understandable, originally represented by statistical regression models, additive models, and decision trees. Although these models offer strong interpretability, their predictive accuracy is often lower than that of ensemble methods and neural networks, especially on large‑scale data.

To address the accuracy gap, researchers have proposed enhanced methods: in 2012, Yin Lou, Rich Caruana, and Johannes Gehrke introduced the Explainable Boosting Machine (EBM) by integrating gradient‑boosted trees into additive models; in 2020, Zebin Yang and Aijun Zhang presented GAMINet, replacing smooth non‑parametric functions with neural networks while adding interaction terms; Friedman and Popescu’s 2008 RuleFit extracts interaction rules from tree models as new features for regression; and Wang and Rudin’s 2015 Falling Rule Lists combine association analysis with Bayesian optimization to produce ordered rule‑based predictions. These approaches retain interpretability while substantially improving accuracy.

Traditional Statistical Models

Traditional intrinsic interpretable models include regression, additive models, and decision trees. Their limited accuracy motivated extensions using neural networks or gradient‑boosted trees, preserving interpretability while boosting performance. This section introduces linear regression, generalized linear models, generalized additive models, and decision trees.

Linear Regression

Linear Regression models the quantitative relationship between multiple independent variables and a response variable. The model can be expressed as y = Xβ + ε , where β are coefficients, ε are i.i.d. residuals, and X includes a constant term.

Parameter estimation is performed via ordinary least squares, minimizing the sum of squared residuals. The solution is obtained by differentiating the loss with respect to β and setting the gradient to zero, yielding β̂ = (XᵀX)⁻¹Xᵀy .

Interpretability is illustrated with a case study using UNDP data: per‑capita GDP, adult literacy rate, and child vaccination rate predict average life expectancy. The fitted model shows positive effects of all three features, quantifying the increase in life expectancy per unit change of each predictor.

The Python implementation using statsmodels is shown below:

<code>import pandas as pd
import statsmodels.api as sm

data = pd.read_excel('data/life_expectancy.xlsx', index_col=0)
data.columns = ['Country','Life_expectancy','GDP_per_capita','Adult_literacy_rate','Vaccination_rates_for_children']

x, y = data.iloc[:, 1:-1], data.iloc[:, -1]
x = sm.add_constant(x)
model = sm.OLS(y, x)
result = model.fit()
result.summary()
</code>

Advantages of linear regression: simple concept, fast execution, and strong interpretability for decision analysis. Limitations: assumes linear relationships and normally distributed residuals, and generally yields lower predictive accuracy compared with more complex machine‑learning models.

Generalized Linear Model

Model Definition

Generalized Linear Models (GLMs) extend linear regression to response variables that follow exponential‑family distributions by introducing a link function g(·) . The model is defined as g(μ) = Xβ , where μ is the expected value of the response. For binary outcomes, using a logit link yields logistic regression.

Parameter Estimation

Parameters are estimated via maximum likelihood estimation (MLE). For logistic regression, the likelihood of observing the data is L(β) = ∏ p_i^{y_i}(1-p_i)^{1-y_i} , with p_i = 1 / (1 + e^{-X_iβ}) . The log‑likelihood is maximized using iterative methods such as gradient descent or Newton‑Raphson.

Model Interpretability

A logistic regression example predicts housing purchase decisions based on household disposable income. The fitted model indicates a positive effect of income on purchase probability, quantifying the odds ratio for each additional ten‑thousand units of income.

Advantages and Limitations

GLMs are simple to implement, applicable to a wider range of data distributions than ordinary linear regression, and retain strong interpretability useful for decision analysis. However, they require the response to belong to the exponential family and may still exhibit lower predictive accuracy than advanced machine‑learning algorithms.

Reference: Shao Ping, Yang Jianying, Su Sida. "Interpretable Machine Learning: Models, Methods, and Practice".

statisticslinear regressionexplainable boosting machineinterpretable machine learninggeneralized linear model
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.