Artificial Intelligence 5 min read

How to Solve Multiple Linear Regression with sklearn and statsmodels in Python

This guide demonstrates how to perform multiple linear regression in Python using sklearn's LinearRegression and the statsmodels library, covering data preparation, model fitting, coefficient extraction, prediction, and detailed statistical diagnostics with example cement heat data.

Model Perspective

Sep 7, 2022

How to Solve Multiple Linear Regression with sklearn and statsmodels in Python

Solving with sklearn.linear_model LinearRegression

Use LinearRegression().fit(X, y) where X is the matrix of independent variables (excluding the column of ones) and y is the dependent variable vector.

Example

Problem

Heat released during cement setting is related to two main chemical components; given a dataset, determine a linear regression model.

Sample data (13 observations):

1  7   26   78.5
2  1   29   74.3
3 11   56  104.3
4 11   31   87.6
5  7   52   95.9
6 11   55  109.2
7  3   71  102.7
8  1   31   72.5
9  2   54   93.1
10 21  47  115.9
11 1   40   83.8
12 11  66  113.3
13 10  68  109.4

Computation

Code

The regression model obtained from sklearn is displayed by calling model.coef_ and model.intercept_. The model’s coefficient of determination (R²) indicates a good fit.

Using statsmodels library

Statsmodels can solve regression models via two interfaces: a formula‑based interface and an array‑based interface.

Formula interface

import statsmodels as sm
sm.formula.ols(formula, data=df)

Here formula is a string such as 'y~x1+x2' and df is a DataFrame or dictionary containing the variables.

Array interface

import statsmodels.api as sm
sm.OLS(y, X).fit()

is the dependent vector and X is the independent matrix with a column of ones added to form the augmented matrix.

Code

Formula‑based example:

import numpy as np
import statsmodels.api as sm
a = np.loadtxt("data/cement.txt")
d = {'x1': a[:,0], 'x2': a[:,1], 'y': a[:,2]}
md = sm.formula.ols('y~x1+x2', d).fit()
print(md.summary())
ypred = md.predict({'x1': a[:,0], 'x2': a[:,1]})

Array‑based example:

import numpy as np
import statsmodels.api as sm
a = np.loadtxt("data/cement.txt")
X = sm.add_constant(a[:,:2])
md = sm.OLS(a[:,2], X).fit()
print(md.params)
print(md.summary2())

Reference

司守奎，孙玺菁 Python数学实验与建模

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python sklearn

Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.