Artificial Intelligence 8 min read

LASSO Regression Explained: Theory, Case Studies, and Python Code

This article introduces the mathematical foundations of ordinary least squares, ridge, and LASSO regression, explains why LASSO requires coordinate descent, presents two real-world case studies with data, and provides complete Python code for fitting, visualizing, and interpreting LASSO models.

Model Perspective

Jul 23, 2022

LASSO Regression Explained: Theory, Case Studies, and Python Code

Mathematical Principles Overview

Ordinary Least Squares (OLS) in multiple regression fits a parameter vector that minimizes the residual sum of squares. Ridge regression adds a penalty term to the loss function to address non‑invertibility, while LASSO regression introduces an L1 penalty, encouraging sparsity in the coefficient vector.

The L1 penalty is the sum of absolute values of the regression coefficients, making the loss function nondifferentiable at zero. Consequently, standard OLS solvers, gradient descent, Newton, and quasi‑Newton methods fail for LASSO. Coordinate descent, which updates one coefficient at a time along coordinate axes, can efficiently solve LASSO.

Before estimating LASSO coefficients, the optimal penalty parameter must be selected, often via qualitative visualization similar to ridge regression.

Case 1

The table below (Malinvand, 1966) contains French economic data: total imports (dependent variable) and three explanatory variables—total domestic output, stock, and total consumption (all in 10⁹ francs).

The task is to obtain the LASSO regression equation for this dataset.

The resulting plot shows the relationship between the regularization path and LASSO coefficients, indicating a good selection. The standardized LASSO regression equation is then back‑transformed to the original scale, and the model’s goodness‑of‑fit is reported, demonstrating LASSO’s ability to perform variable selection conveniently.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Lasso, LassoCV
from scipy.stats import zscore

plt.rc('font', size=16)
plt.rc('text', usetex=True)  # comment if LaTeX not installed

a = np.loadtxt("data/economic.txt")
n = a.shape[1] - 1  # number of predictors
aa = zscore(a)        # standardize data
x = aa[:, :n]
y = aa[:, n]

b = []
kk = np.logspace(-4, 0, 100)  # range of alpha values
for k in kk:
    md = Lasso(alpha=k).fit(x, y)
    b.append(md.coef_)

st = ['s-r', '*-k', 'p-b']
for i in range(3):
    plt.plot(kk, np.array(b)[:, i], st[i])
plt.legend(['$x_1$', '$x_2$', '$x_3$'], fontsize=15)
plt.show()

mdcv = LassoCV(alphas=np.logspace(-4, 0, 100)).fit(x, y)
print("Optimal alpha=", mdcv.alpha_)
md0 = Lasso(0.21).fit(x, y)
cs0 = md0.coef_
print("Standardized coefficients:", cs0)

mu = np.mean(a, axis=0)
s = np.std(a, axis=0, ddof=1)
params = [mu[-1] - s[-1] * sum(cs0 * mu[:-1] / s[:-1]), s[-1] * cs0 / s[:-1]]
print("Original scale coefficients:", params)
print("R^2:", md0.score(x, y))

Case 2

When modeling Chinese private car ownership, four factors are considered: per‑capita disposable income of urban households, total urban population, annual automobile production, and total highway length. The dependent variable is the number of private cars (in ten‑thousands).

Using ordinary least squares yields a regression equation with four explanatory variables, but some coefficients are insignificant or negative, making the model unreasonable. Therefore, LASSO regression is applied to obtain a more parsimonious model.

import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn.linear_model import Lasso
from scipy.stats import zscore

# plt.rc('text', usetex=True)  # comment if LaTeX not installed

a = np.loadtxt("data/car.txt")  # 9 rows, 5 columns
n = a.shape[1] - 1
x = a[:, :n]
X = sm.add_constant(x)
md = sm.OLS(a[:, n], X).fit()
print(md.summary())

aa = zscore(a)
x = aa[:, :n]
y = aa[:, n]

b = []
kk = np.logspace(-4, 0, 100)
for k in kk:
    md = Lasso(alpha=k).fit(x, y)
    b.append(md.coef_)

st = ['s-r', '*-k', 'p-b', '^-y']
for i in range(n):
    plt.plot(kk, np.array(b)[:, i], st[i])
plt.legend(['$x_1$', '$x_2$', '$x_3$', '$x_4$'], fontsize=15)
plt.show()

md0 = Lasso(0.05).fit(x, y)
cs0 = md0.coef_
print("Standardized coefficients:", cs0)

mu = a.mean(axis=0)
s = a.std(axis=0, ddof=1)
params = [mu[-1] - s[-1] * sum(cs0 * mu[:-1] / s[:-1]), s[-1] * cs0 / s[:-1]]
print("Original scale coefficients:", params)
print("R^2:", md0.score(x, y))

References

司守奎，孙玺菁. Python数学实验与建模.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning Python statistics Regression LASSO

Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.