How to Solve Multiple Linear Regression with sklearn and statsmodels in Python

This article demonstrates how to perform multiple linear regression using scikit‑learn's LinearRegression and the statsmodels library, covering data preparation, model fitting, code examples, and interpretation of statistical results for a cement heat‑release case study.

Model Perspective
Model Perspective
Model Perspective
How to Solve Multiple Linear Regression with sklearn and statsmodels in Python

Solving Multiple Linear Regression with sklearn.linear_model

Using sklearn.linear_model's LinearRegression function you can solve multiple linear regression problems, but the built‑in model evaluation provides only a single metric, so you need to program additional statistical tests yourself. The model is built and fitted with LinearRegression().fit(X, y) where X is the matrix of explanatory variables (excluding the intercept column) and y is the response vector.

Example

Problem

The heat released during cement setting is related to two main chemical components. Given a set of measurements, determine a linear regression model.

Data (x1, x2, y):

1  7  26  78.5
2  1  29  74.3
3 11  56 104.3
4 11  31  87.6
5  7  52  95.9
6 11  55 109.2
7  3  71 102.7
8  1  31  72.5
9  2  54  93.1
10 21 47 115.9
11  1  40  83.8
12 11 66 113.3
13 10 68 109.4

Computation

Code

The obtained regression model shows a high coefficient of determination, indicating a good fit.

Using statsmodels library

statsmodels

offers two ways to fit regression models: a formula‑based approach and an array‑based approach.

Formula‑based

import statsmodels as sm
sm.formula.ols(formula, data=df)

where formula is a string such as 'y ~ x1 + x2' and df is a DataFrame or dictionary containing the data.

Array‑based

import statsmodels.api as sm
sm.OLS(y, X).fit()

where y is the response vector and X is the design matrix with a constant column added.

Code

Formula example

import numpy as np; import statsmodels.api as sm
a = np.loadtxt("data/cement.txt")
d = {'x1': a[:,0], 'x2': a[:,1], 'y': a[:,2]}
md = sm.formula.ols('y~x1+x2', d).fit()
print(md.summary())
ypred = md.predict({'x1': a[:,0], 'x2': a[:,1]})

Array example

import numpy as np; import statsmodels.api as sm
a = np.loadtxt("data/cement.txt")
X = sm.add_constant(a[:,:2])
md = sm.OLS(a[:,2], X).fit()
print(md.params)
print(md.summary2())

Reference: 司守奎,孙玺菁 Python数学实验与建模

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Pythondata analysislinear regressionStatsmodelssklearn
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.