Evaluating Linear Regression Model Performance with K-Fold Cross-Validation in Python
This tutorial teaches how to evaluate a linear regression model's performance using K‑fold cross‑validation in Python, covering data loading, preparation, computation of MSE and R² metrics, and visualizing predictions with matplotlib, and interpreting the results.
Objective : Learn to evaluate a model's performance using cross‑validation.
Learning Content : Cross‑validation techniques and evaluation metrics such as Mean Squared Error (MSE), R² (and optionally AUC‑ROC).
Code Example :
Import libraries:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, cross_val_score, KFold
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.datasets import load_boston
import matplotlib.pyplot asLoad the Boston housing dataset:
# Load example dataset (Boston housing)
boston = load_boston()
df = pd.DataFrame(boston.data, columns=boston.feature_names)
df['PRICE'] = boston.target
print(f"示例数据集:
{df.head()}")Prepare data and split:
# Split data into training and test sets
X = df.drop('PRICE', axis=1)
y = df['PRICE']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f"训练集特征:
{X_train.head()}")
print(f"测试集特征:
{X_test.head()}")
print(f"训练集标签:
{y_train.head()}")
print(f"测试集标签:
{y_test.head()}")Perform K‑fold cross‑validation:
# Evaluate Linear Regression with K‑fold CV
model = LinearRegression()
kf = KFold(n_splits=5, shuffle=True, random_state=42)
# MSE
mse_scores = cross_val_score(model, X, y, cv=kf, scoring='neg_mean_squared_error')
mse_scores = -mse_scores # convert to positive
print(f"交叉验证的 MSE 评分: {mse_scores}")
print(f"交叉验证的平均 MSE: {mse_scores.mean():.2f}")
# R2
r2_scores = cross_val_score(model, X, y, cv=kf, scoring='r2')
print(f"交叉验证的 R2 评分: {r2_scores}")
print(f"交叉验证的平均 R2: {r2_scores.mean():.2f}")Train the model and evaluate on the test set:
# Train and predict
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Evaluate
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"测试集上的 MSE: {mse:.2f}")
print(f"测试集上的 R2: {r2:.2f}")Visualize predictions:
# Plot results
plt.scatter(y_test, y_pred)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], '--k')
plt.xlabel('真实价格')
plt.ylabel('预测价格')
plt.title('线性回归模型预测结果')
plt.show()Practice : Run the above script to apply K‑fold cross‑validation on a linear regression model and observe the MSE and R² values for each fold and the averaged scores.
Summary : By completing this exercise you should be able to use cross‑validation to assess model performance and compute common metrics such as MSE and R², helping you understand how the model behaves on different data subsets.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
