Artificial Intelligence 6 min read

Evaluating Linear Regression Model Performance with K-Fold Cross-Validation in Python

This tutorial teaches how to evaluate a linear regression model's performance using K‑fold cross‑validation in Python, covering data loading, preparation, computation of MSE and R² metrics, and visualizing predictions with matplotlib, and interpreting the results.

Test Development Learning Exchange
Test Development Learning Exchange
Test Development Learning Exchange
Evaluating Linear Regression Model Performance with K-Fold Cross-Validation in Python

Objective : Learn to evaluate a model's performance using cross‑validation.

Learning Content : Cross‑validation techniques and evaluation metrics such as Mean Squared Error (MSE), R² (and optionally AUC‑ROC).

Code Example :

Import libraries:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, cross_val_score, KFold
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.datasets import load_boston
import matplotlib.pyplot as

Load the Boston housing dataset:

# Load example dataset (Boston housing)
boston = load_boston()
df = pd.DataFrame(boston.data, columns=boston.feature_names)
df['PRICE'] = boston.target
print(f"示例数据集: \n{df.head()}")

Prepare data and split:

# Split data into training and test sets
X = df.drop('PRICE', axis=1)
y = df['PRICE']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f"训练集特征: \n{X_train.head()}")
print(f"测试集特征: \n{X_test.head()}")
print(f"训练集标签: \n{y_train.head()}")
print(f"测试集标签: \n{y_test.head()}")

Perform K‑fold cross‑validation:

# Evaluate Linear Regression with K‑fold CV
model = LinearRegression()
kf = KFold(n_splits=5, shuffle=True, random_state=42)
# MSE
mse_scores = cross_val_score(model, X, y, cv=kf, scoring='neg_mean_squared_error')
mse_scores = -mse_scores  # convert to positive
print(f"交叉验证的 MSE 评分: {mse_scores}")
print(f"交叉验证的平均 MSE: {mse_scores.mean():.2f}")
# R2
r2_scores = cross_val_score(model, X, y, cv=kf, scoring='r2')
print(f"交叉验证的 R2 评分: {r2_scores}")
print(f"交叉验证的平均 R2: {r2_scores.mean():.2f}")

Train the model and evaluate on the test set:

# Train and predict
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Evaluate
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"测试集上的 MSE: {mse:.2f}")
print(f"测试集上的 R2: {r2:.2f}")

Visualize predictions:

# Plot results
plt.scatter(y_test, y_pred)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], '--k')
plt.xlabel('真实价格')
plt.ylabel('预测价格')
plt.title('线性回归模型预测结果')
plt.show()

Practice : Run the above script to apply K‑fold cross‑validation on a linear regression model and observe the MSE and R² values for each fold and the averaged scores.

Summary : By completing this exercise you should be able to use cross‑validation to assess model performance and compute common metrics such as MSE and R², helping you understand how the model behaves on different data subsets.

machine learningPythonmodel evaluationMSElinear regressioncross-validationR2
Test Development Learning Exchange
Written by

Test Development Learning Exchange

Test Development Learning Exchange

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.