Artificial Intelligence 7 min read

Ridge Regression with scikit-learn: Theory, Implementation, and Example

This article introduces Ridge regression, explains its theory and regularization role, discusses overfitting and bias‑variance trade‑offs, presents scikit‑learn parameters, and provides a complete Python example from data loading to model training, evaluation, and optimal alpha selection.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Ridge Regression with scikit-learn: Theory, Implementation, and Example

The article provides a comprehensive guide to Ridge regression (also known as L2 regularization) within the context of supervised learning, covering its motivation, mathematical formulation, and practical usage with the scikit‑learn library.

It begins with an overview of overfitting and underfitting, describing how high variance leads to overfitting while high bias leads to underfitting, and lists common causes of overfitting such as insufficient or noisy data, overly complex models, and inappropriate assumptions.

Typical mitigation strategies are then presented, including early stopping, validation sets, cross‑validation, and regularization (L1/L2). The article then introduces the scikit‑learn implementation of Ridge regression, summarizing key parameters such as alpha (regularization strength) and max_iter (maximum iterations).

Example code demonstrates the full workflow:

#coding=utf-8
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.linear_model import Ridge, RidgeCV

Data loading and preparation:

data = pd.read_csv('..\\Folds5x2_pp.csv')
X = data[['AT', 'V', 'AP', 'RH']]
Y = data[['PE']]
X_TRAIN, X_TEST, Y_TRAIN, Y_TEST = train_test_split(X, Y, random_state=1)

Model training and prediction:

ridge = Ridge(alpha=1)
ridge.fit(X_TRAIN, Y_TRAIN)
Y_PRED = ridge.predict(X_TEST)
print(ridge.coef_)
print(ridge.intercept_)

Finding the optimal regularization strength with RidgeCV :

ridgecv = RidgeCV(alphas=[0.01,0.1,0.5,1,3,5,7,10,20,100])
ridgecv.fit(X_TRAIN, Y_TRAIN)
print(ridgecv.alpha_)

Visualization of predicted versus measured values:

fig, ax = plt.subplots()
ax.scatter(Y_TEST, Y_PRED)
ax.plot([Y_TEST.min(), Y_TEST.max()], [Y_TEST.min(), Y_TEST.max()], 'k--', lw=4)
ax.set_xlabel('Measured')
ax.set_ylabel('Predicted')
plt.show()

The theoretical section revisits the linear regression loss function and shows how adding the L2 penalty modifies the normal equation, leading to the closed‑form solution \(\theta = (X^TX + \alpha I)^{-1}X^Ty\). It explains that increasing \(\alpha\) shrinks the coefficients toward zero, reducing variance at the cost of increased bias.

In conclusion, Ridge regression mitigates the overfitting problem of ordinary least squares by introducing a regularization term, offering a balance between bias and variance that can be tuned via the \(\alpha\) parameter.

machine learningpythonregressionregularizationscikit-learnridge regression
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.