Ridge Regression with scikit-learn: Theory, Implementation, and Example
This article introduces Ridge regression, explains its theory and regularization role, discusses overfitting and bias‑variance trade‑offs, presents scikit‑learn parameters, and provides a complete Python example from data loading to model training, evaluation, and optimal alpha selection.
The article provides a comprehensive guide to Ridge regression (also known as L2 regularization) within the context of supervised learning, covering its motivation, mathematical formulation, and practical usage with the scikit‑learn library.
It begins with an overview of overfitting and underfitting, describing how high variance leads to overfitting while high bias leads to underfitting, and lists common causes of overfitting such as insufficient or noisy data, overly complex models, and inappropriate assumptions.
Typical mitigation strategies are then presented, including early stopping, validation sets, cross‑validation, and regularization (L1/L2). The article then introduces the scikit‑learn implementation of Ridge regression, summarizing key parameters such as alpha (regularization strength) and max_iter (maximum iterations).
Example code demonstrates the full workflow:
#coding=utf-8
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.linear_model import Ridge, RidgeCVData loading and preparation:
data = pd.read_csv('..\\Folds5x2_pp.csv')
X = data[['AT', 'V', 'AP', 'RH']]
Y = data[['PE']]
X_TRAIN, X_TEST, Y_TRAIN, Y_TEST = train_test_split(X, Y, random_state=1)Model training and prediction:
ridge = Ridge(alpha=1)
ridge.fit(X_TRAIN, Y_TRAIN)
Y_PRED = ridge.predict(X_TEST)
print(ridge.coef_)
print(ridge.intercept_)Finding the optimal regularization strength with RidgeCV :
ridgecv = RidgeCV(alphas=[0.01,0.1,0.5,1,3,5,7,10,20,100])
ridgecv.fit(X_TRAIN, Y_TRAIN)
print(ridgecv.alpha_)Visualization of predicted versus measured values:
fig, ax = plt.subplots()
ax.scatter(Y_TEST, Y_PRED)
ax.plot([Y_TEST.min(), Y_TEST.max()], [Y_TEST.min(), Y_TEST.max()], 'k--', lw=4)
ax.set_xlabel('Measured')
ax.set_ylabel('Predicted')
plt.show()The theoretical section revisits the linear regression loss function and shows how adding the L2 penalty modifies the normal equation, leading to the closed‑form solution \(\theta = (X^TX + \alpha I)^{-1}X^Ty\). It explains that increasing \(\alpha\) shrinks the coefficients toward zero, reducing variance at the cost of increased bias.
In conclusion, Ridge regression mitigates the overfitting problem of ordinary least squares by introducing a regularization term, offering a balance between bias and variance that can be tuned via the \(\alpha\) parameter.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.