Linear Regression Theory and Python Implementation with Iris and Boston Datasets
This article explains the fundamentals of linear regression, including regression formulas, loss functions, and error metrics, and provides complete Python code using scikit‑learn to perform both simple and multiple linear regression on the Iris and Boston housing datasets, along with model evaluation and visualization.
Theoretical Foundations
Linear regression models the relationship between variables using the equation f(x) = kx + b for simple regression, where k is the weight and b the intercept; for multiple regression the equation extends to multiple weights. The loss function is typically the mean squared error (MSE), and model performance is assessed with MSE, RMSE (the square root of MSE), MAE, and the coefficient of determination (R²), where lower error values and higher R² indicate better fit.
Code Implementation – Simple Linear Regression on the Iris Dataset
# 导入鸢尾花数据集
from sklearn.datasets import load_iris
# 导入用于分割训练集和测试集的类
from sklearn.model_selection import train_test_split
# 导入线性回归类
from sklearn.linear_model import LinearRegression
import numpy as np
iris = load_iris()
# 第三列是花瓣长度,第四列是花瓣宽度
x, y = iris.data[:, 2].reshape(-1, 1), iris.data[:, 3]
lr = LinearRegression()
# 划分训练集和测试集(25% 为测试集)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=0)
# 训练模型
lr.fit(x_train, y_train)
# 预测
y_hat = lr.predict(x_test)
print(y_train[:5])
print(y_hat[:5])
# 评估模型
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
print('MSE:', mean_squared_error(y_test, y_hat))
print('RMSE:', np.sqrt(mean_squared_error(y_test, y_hat)))
print('MAE:', mean_absolute_error(y_test, y_hat))
print('R² (metric):', r2_score(y_test, y_hat))
print('R² (model):', lr.score(x_test, y_test))
# 可视化
from matplotlib import pyplot as plt
plt.rcParams['font.family'] = 'SimHei'
plt.rcParams['axes.unicode_minus'] = False
plt.rcParams['font.size'] = 15
plt.figure(figsize=(20, 8))
plt.scatter(x_train, y_train, color='green', marker='o', label='训练集')
plt.scatter(x_test, y_test, color='orange', marker='o', label='测试集')
plt.plot(x, lr.predict(x), 'r-')
plt.legend()
plt.xlabel('花瓣长度')
plt.ylabel('花瓣宽度')
plt.show()The resulting plot shows the fitted regression line together with training and test points, illustrating the relationship between petal length and width.
Discussion of Simple Regression
While the Iris example demonstrates a clear linear relationship between two features, real‑world problems often involve many influencing factors, necessitating multiple linear regression.
Code Implementation – Multiple Linear Regression on the Boston Housing Dataset
import pandas as pd
import numpy as np
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
boston = load_boston()
# 特征矩阵和目标向量
x, y = boston.data, boston.target
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.15, random_state=0)
lr = LinearRegression()
lr.fit(x_train, y_train)
# 打印每个特征的权重和截距
print(lr.coef_)
print(lr.intercept_)
y_hat = lr.predict(x_test)
# 评估(同上)
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
print('MSE:', mean_squared_error(y_test, y_hat))
print('RMSE:', np.sqrt(mean_squared_error(y_test, y_hat)))
print('MAE:', mean_absolute_error(y_test, y_hat))
print('R²:', r2_score(y_test, y_hat))The output shows the coefficients for each of the eight Boston housing features, indicating the relative influence of each factor on house prices, and provides the same error metrics for model evaluation.
Conclusion
Both examples illustrate how linear regression can be applied to simple and multiple feature scenarios, how to interpret model coefficients, and how to assess performance using standard regression metrics.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
