Artificial Intelligence 7 min read

Ridge Regression with scikit-learn: Theory, Implementation, and Example

This article introduces Ridge regression, explains its theory and regularization role, discusses overfitting and bias‑variance trade‑offs, presents scikit‑learn parameters, and provides a complete Python example from data loading to model training, evaluation, and optimal alpha selection.

Qunar Tech Salon

Oct 9, 2018

Ridge Regression with scikit-learn: Theory, Implementation, and Example

The article provides a comprehensive guide to Ridge regression (also known as L2 regularization) within the context of supervised learning, covering its motivation, mathematical formulation, and practical usage with the scikit‑learn library.

It begins with an overview of overfitting and underfitting, describing how high variance leads to overfitting while high bias leads to underfitting, and lists common causes of overfitting such as insufficient or noisy data, overly complex models, and inappropriate assumptions.

Typical mitigation strategies are then presented, including early stopping, validation sets, cross‑validation, and regularization (L1/L2). The article then introduces the scikit‑learn implementation of Ridge regression, summarizing key parameters such as alpha (regularization strength) and max_iter (maximum iterations).

Example code demonstrates the full workflow:

#coding=utf-8
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.linear_model import Ridge, RidgeCV

Data loading and preparation:

data = pd.read_csv('..\\Folds5x2_pp.csv')
X = data[['AT', 'V', 'AP', 'RH']]
Y = data[['PE']]
X_TRAIN, X_TEST, Y_TRAIN, Y_TEST = train_test_split(X, Y, random_state=1)

Model training and prediction:

ridge = Ridge(alpha=1)
ridge.fit(X_TRAIN, Y_TRAIN)
Y_PRED = ridge.predict(X_TEST)
print(ridge.coef_)
print(ridge.intercept_)

Finding the optimal regularization strength with RidgeCV:

ridgecv = RidgeCV(alphas=[0.01,0.1,0.5,1,3,5,7,10,20,100])
ridgecv.fit(X_TRAIN, Y_TRAIN)
print(ridgecv.alpha_)

Visualization of predicted versus measured values:

fig, ax = plt.subplots()
ax.scatter(Y_TEST, Y_PRED)
ax.plot([Y_TEST.min(), Y_TEST.max()], [Y_TEST.min(), Y_TEST.max()], 'k--', lw=4)
ax.set_xlabel('Measured')
ax.set_ylabel('Predicted')
plt.show()

The theoretical section revisits the linear regression loss function and shows how adding the L2 penalty modifies the normal equation, leading to the closed‑form solution \(\theta = (X^TX + \alpha I)^{-1}X^Ty\). It explains that increasing \(\alpha\) shrinks the coefficients toward zero, reducing variance at the cost of increased bias.

In conclusion, Ridge regression mitigates the overfitting problem of ordinary least squares by introducing a regularization term, offering a balance between bias and variance that can be tuned via the \(\alpha\) parameter.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning Python Regression Regularization Ridge Regression

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.