Evaluating Linear Regression Model Performance with K-Fold Cross-Validation in Python
This tutorial teaches how to evaluate a linear regression model's performance using K‑fold cross‑validation in Python, covering data loading, preparation, computation of MSE and R² metrics, and visualizing predictions with matplotlib, and interpreting the results.
Objective : Learn to evaluate a model's performance using cross‑validation.
Learning Content : Cross‑validation techniques and evaluation metrics such as Mean Squared Error (MSE), R² (and optionally AUC‑ROC).
Code Example :
Import libraries:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, cross_val_score, KFold
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.datasets import load_boston
import matplotlib.pyplot asLoad the Boston housing dataset:
# Load example dataset (Boston housing)
boston = load_boston()
df = pd.DataFrame(boston.data, columns=boston.feature_names)
df['PRICE'] = boston.target
print(f"示例数据集: \n{df.head()}")Prepare data and split:
# Split data into training and test sets
X = df.drop('PRICE', axis=1)
y = df['PRICE']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f"训练集特征: \n{X_train.head()}")
print(f"测试集特征: \n{X_test.head()}")
print(f"训练集标签: \n{y_train.head()}")
print(f"测试集标签: \n{y_test.head()}")Perform K‑fold cross‑validation:
# Evaluate Linear Regression with K‑fold CV
model = LinearRegression()
kf = KFold(n_splits=5, shuffle=True, random_state=42)
# MSE
mse_scores = cross_val_score(model, X, y, cv=kf, scoring='neg_mean_squared_error')
mse_scores = -mse_scores # convert to positive
print(f"交叉验证的 MSE 评分: {mse_scores}")
print(f"交叉验证的平均 MSE: {mse_scores.mean():.2f}")
# R2
r2_scores = cross_val_score(model, X, y, cv=kf, scoring='r2')
print(f"交叉验证的 R2 评分: {r2_scores}")
print(f"交叉验证的平均 R2: {r2_scores.mean():.2f}")Train the model and evaluate on the test set:
# Train and predict
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Evaluate
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"测试集上的 MSE: {mse:.2f}")
print(f"测试集上的 R2: {r2:.2f}")Visualize predictions:
# Plot results
plt.scatter(y_test, y_pred)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], '--k')
plt.xlabel('真实价格')
plt.ylabel('预测价格')
plt.title('线性回归模型预测结果')
plt.show()Practice : Run the above script to apply K‑fold cross‑validation on a linear regression model and observe the MSE and R² values for each fold and the averaged scores.
Summary : By completing this exercise you should be able to use cross‑validation to assess model performance and compute common metrics such as MSE and R², helping you understand how the model behaves on different data subsets.
Test Development Learning Exchange
Test Development Learning Exchange
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.