Understanding Ridge Regression: Definitions, Properties, and Parameter Selection
This article explains ridge regression by defining the estimator, outlining its key properties, discussing methods for choosing the ridge parameter, and demonstrating its application to economic data with Python code and visualizations.
Ridge Estimation Definition and Properties
Ridge estimation arises naturally when multicollinearity exists; by adding a regularization matrix to the design matrix, the likelihood of near-singular solutions is reduced, making the estimator more stable than ordinary least squares.
Definition 1. For a given matrix \(X\) satisfying condition (12.16), the estimator \(\hat\beta_{ridge}\) defined by \(\hat\beta_{ridge}= (X^TX + \lambda I)^{-1}X^Ty\) is called the ridge estimator. The resulting regression equation is the ridge regression equation, where \(\lambda\) is the ridge parameter. When \(\lambda = 0\), the estimator reduces to the ordinary least squares estimator.
The following important properties of ridge estimation are introduced:
Property 1. Ridge estimation is biased, i.e., \(E[\hat\beta_{ridge}] \neq \beta\).
Property 2. Ridge estimation is a shrinkage estimator, i.e., it reduces the magnitude of the coefficients.
Choosing the Ridge Parameter
Two intuitive methods are presented. (1) Ridge trace method : observe the ridge trace curves and select the point where the coefficients become stable while the residual sum of squares does not increase excessively.
(2) Mean squared error method : the mean squared error of the ridge estimator is a function of \(\lambda\) and attains a minimum at some value. Compute and plot the MSE; choose the \(\lambda\) at its minimum.
Application of Ridge Regression
The table below (from Malinvand, 1966) contains French economic data: the dependent variable is total imports, and the three explanatory variables are total domestic output, stock, and total consumption (all in billions of francs).
The ridge regression equation is to be estimated.
Solution
The ridge trace plot (Figure 12.1) suggests a suitable \(\lambda\). The standardized ridge regression equation is obtained and then transformed back to the original scale.
Code
<code>import numpy as np; import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge, RidgeCV
from scipy.stats import zscore
#plt.rc('text', usetex=True) # LaTeX not installed, comment out
a = np.loadtxt("data/economic.txt")
n = a.shape[1] - 1 # number of independent variables
aa = zscore(a) # data standardization
x = aa[:, :n]; y = aa[:, n] # independent and dependent matrices
b = [] # store regression coefficients
kk = np.logspace(-4, 0, 100) # range of lambda values
for k in kk:
md = Ridge(alpha=k).fit(x, y)
b.append(md.coef_)
st = ['s-r', '*-k', 'p-b'] # plot style strings
for i in range(3):
plt.plot(kk, np.array(b)[:, i], st[i])
plt.legend([r'$x_1$', r'$x_2$', r'$x_3$'], fontsize=15)
plt.show()
mdcv = RidgeCV(alphas=np.logspace(-4, 0, 100)).fit(x, y)
print("Optimal alpha=", mdcv.alpha_)
#md0 = Ridge(mdcv.alpha_).fit(x, y) # build and fit model
md0 = Ridge(0.4).fit(x, y) # build and fit model
cs0 = md0.coef_ # standardized coefficients b1, b2, b3
print("Standardized coefficients:", cs0)
mu = np.mean(a, axis=0)
s = np.std(a, axis=0, ddof=1) # compute mean and std
params = [mu[-1] - s[-1] * sum(cs0 * mu[:-1] / s[:-1]), s[-1] * cs0 / s[:-1]]
print("Original data regression coefficients:", params)
print("Goodness of fit:", md0.score(x, y))
</code>The program uses RidgeCV to determine the optimal \(\lambda\), but the resulting coefficient for the dependent variable is negative, so the final \(\lambda\) is chosen subjectively.
References
Si Shougui, Sun Xijing. Python Mathematics Experiments and Modeling .
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.