Artificial Intelligence 6 min read

Understanding Ridge Regression: Definitions, Properties, and Parameter Selection

This article explains ridge regression by defining the estimator, outlining its key properties, discussing methods for choosing the ridge parameter, and demonstrating its application to economic data with Python code and visualizations.

Model Perspective
Model Perspective
Model Perspective
Understanding Ridge Regression: Definitions, Properties, and Parameter Selection

Ridge Estimation Definition and Properties

Ridge estimation arises naturally when multicollinearity exists; by adding a regularization matrix to the design matrix, the likelihood of near-singular solutions is reduced, making the estimator more stable than ordinary least squares.

Definition 1. For a given matrix \(X\) satisfying condition (12.16), the estimator \(\hat\beta_{ridge}\) defined by \(\hat\beta_{ridge}= (X^TX + \lambda I)^{-1}X^Ty\) is called the ridge estimator. The resulting regression equation is the ridge regression equation, where \(\lambda\) is the ridge parameter. When \(\lambda = 0\), the estimator reduces to the ordinary least squares estimator.

The following important properties of ridge estimation are introduced:

Property 1. Ridge estimation is biased, i.e., \(E[\hat\beta_{ridge}] \neq \beta\).
Property 2. Ridge estimation is a shrinkage estimator, i.e., it reduces the magnitude of the coefficients.

Choosing the Ridge Parameter

Two intuitive methods are presented. (1) Ridge trace method : observe the ridge trace curves and select the point where the coefficients become stable while the residual sum of squares does not increase excessively.

(2) Mean squared error method : the mean squared error of the ridge estimator is a function of \(\lambda\) and attains a minimum at some value. Compute and plot the MSE; choose the \(\lambda\) at its minimum.

Application of Ridge Regression

The table below (from Malinvand, 1966) contains French economic data: the dependent variable is total imports, and the three explanatory variables are total domestic output, stock, and total consumption (all in billions of francs).

The ridge regression equation is to be estimated.

Solution

The ridge trace plot (Figure 12.1) suggests a suitable \(\lambda\). The standardized ridge regression equation is obtained and then transformed back to the original scale.

Code

<code>import numpy as np; import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge, RidgeCV
from scipy.stats import zscore
#plt.rc('text', usetex=True)  # LaTeX not installed, comment out

a = np.loadtxt("data/economic.txt")
n = a.shape[1] - 1  # number of independent variables
aa = zscore(a)  # data standardization
x = aa[:, :n]; y = aa[:, n]  # independent and dependent matrices
b = []  # store regression coefficients
kk = np.logspace(-4, 0, 100)  # range of lambda values
for k in kk:
    md = Ridge(alpha=k).fit(x, y)
    b.append(md.coef_)
st = ['s-r', '*-k', 'p-b']  # plot style strings
for i in range(3):
    plt.plot(kk, np.array(b)[:, i], st[i])
plt.legend([r'$x_1$', r'$x_2$', r'$x_3$'], fontsize=15)
plt.show()

mdcv = RidgeCV(alphas=np.logspace(-4, 0, 100)).fit(x, y)
print("Optimal alpha=", mdcv.alpha_)
#md0 = Ridge(mdcv.alpha_).fit(x, y)  # build and fit model
md0 = Ridge(0.4).fit(x, y)  # build and fit model
cs0 = md0.coef_  # standardized coefficients b1, b2, b3
print("Standardized coefficients:", cs0)
mu = np.mean(a, axis=0)
s = np.std(a, axis=0, ddof=1)  # compute mean and std
params = [mu[-1] - s[-1] * sum(cs0 * mu[:-1] / s[:-1]), s[-1] * cs0 / s[:-1]]
print("Original data regression coefficients:", params)
print("Goodness of fit:", md0.score(x, y))
</code>

The program uses RidgeCV to determine the optimal \(\lambda\), but the resulting coefficient for the dependent variable is negative, so the final \(\lambda\) is chosen subjectively.

References

Si Shougui, Sun Xijing. Python Mathematics Experiments and Modeling .

pythonregularizationridge regressionlinear modelsparameter selection
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.