LASSO Regression Explained: Theory, Case Studies, and Python Code
This article introduces the mathematical foundations of ordinary least squares, ridge, and LASSO regression, explains why LASSO requires coordinate descent, presents two real-world case studies with data, and provides complete Python code for fitting, visualizing, and interpreting LASSO models.
Mathematical Principles Overview
Ordinary Least Squares (OLS) in multiple regression fits a parameter vector that minimizes the residual sum of squares. Ridge regression adds a penalty term to the loss function to address non‑invertibility, while LASSO regression introduces an L1 penalty, encouraging sparsity in the coefficient vector.
The L1 penalty is the sum of absolute values of the regression coefficients, making the loss function nondifferentiable at zero. Consequently, standard OLS solvers, gradient descent, Newton, and quasi‑Newton methods fail for LASSO. Coordinate descent, which updates one coefficient at a time along coordinate axes, can efficiently solve LASSO.
Before estimating LASSO coefficients, the optimal penalty parameter must be selected, often via qualitative visualization similar to ridge regression.
Case 1
The table below (Malinvand, 1966) contains French economic data: total imports (dependent variable) and three explanatory variables—total domestic output, stock, and total consumption (all in 10⁹ francs).
The task is to obtain the LASSO regression equation for this dataset.
The resulting plot shows the relationship between the regularization path and LASSO coefficients, indicating a good selection. The standardized LASSO regression equation is then back‑transformed to the original scale, and the model’s goodness‑of‑fit is reported, demonstrating LASSO’s ability to perform variable selection conveniently.
<code>import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Lasso, LassoCV
from scipy.stats import zscore
plt.rc('font', size=16)
plt.rc('text', usetex=True) # comment if LaTeX not installed
a = np.loadtxt("data/economic.txt")
n = a.shape[1] - 1 # number of predictors
aa = zscore(a) # standardize data
x = aa[:, :n]
y = aa[:, n]
b = []
kk = np.logspace(-4, 0, 100) # range of alpha values
for k in kk:
md = Lasso(alpha=k).fit(x, y)
b.append(md.coef_)
st = ['s-r', '*-k', 'p-b']
for i in range(3):
plt.plot(kk, np.array(b)[:, i], st[i])
plt.legend(['$x_1$', '$x_2$', '$x_3$'], fontsize=15)
plt.show()
mdcv = LassoCV(alphas=np.logspace(-4, 0, 100)).fit(x, y)
print("Optimal alpha=", mdcv.alpha_)
md0 = Lasso(0.21).fit(x, y)
cs0 = md0.coef_
print("Standardized coefficients:", cs0)
mu = np.mean(a, axis=0)
s = np.std(a, axis=0, ddof=1)
params = [mu[-1] - s[-1] * sum(cs0 * mu[:-1] / s[:-1]), s[-1] * cs0 / s[:-1]]
print("Original scale coefficients:", params)
print("R^2:", md0.score(x, y))
</code>Case 2
When modeling Chinese private car ownership, four factors are considered: per‑capita disposable income of urban households, total urban population, annual automobile production, and total highway length. The dependent variable is the number of private cars (in ten‑thousands).
Using ordinary least squares yields a regression equation with four explanatory variables, but some coefficients are insignificant or negative, making the model unreasonable. Therefore, LASSO regression is applied to obtain a more parsimonious model.
<code>import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn.linear_model import Lasso
from scipy.stats import zscore
# plt.rc('text', usetex=True) # comment if LaTeX not installed
a = np.loadtxt("data/car.txt") # 9 rows, 5 columns
n = a.shape[1] - 1
x = a[:, :n]
X = sm.add_constant(x)
md = sm.OLS(a[:, n], X).fit()
print(md.summary())
aa = zscore(a)
x = aa[:, :n]
y = aa[:, n]
b = []
kk = np.logspace(-4, 0, 100)
for k in kk:
md = Lasso(alpha=k).fit(x, y)
b.append(md.coef_)
st = ['s-r', '*-k', 'p-b', '^-y']
for i in range(n):
plt.plot(kk, np.array(b)[:, i], st[i])
plt.legend(['$x_1$', '$x_2$', '$x_3$', '$x_4$'], fontsize=15)
plt.show()
md0 = Lasso(0.05).fit(x, y)
cs0 = md0.coef_
print("Standardized coefficients:", cs0)
mu = a.mean(axis=0)
s = a.std(axis=0, ddof=1)
params = [mu[-1] - s[-1] * sum(cs0 * mu[:-1] / s[:-1]), s[-1] * cs0 / s[:-1]]
print("Original scale coefficients:", params)
print("R^2:", md0.score(x, y))
</code>References
司守奎,孙玺菁. Python数学实验与建模.
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.