Fundamentals 8 min read

How Simple Linear Regression Uncovers Hidden Relationships in Data

This article explains the theory and practice of simple linear regression, covering deterministic vs. stochastic relationships, the least‑squares estimation of coefficients, goodness‑of‑fit measures such as R², hypothesis testing for linearity, and a real‑world case linking wine consumption to heart‑disease mortality.

Model Perspective
Model Perspective
Model Perspective
How Simple Linear Regression Uncovers Hidden Relationships in Data

Simple Linear Regression

Relationships between variables can be deterministic (expressible by a function) or stochastic (described only by statistical regularities). Regression analysis provides a statistical method to model stochastic relationships, yielding an empirical formula and using probability theory to assess its validity.

General Form of a Simple Linear Regression Model

When observed data roughly align along a straight line, the relationship between the independent variable x and the dependent variable y can be approximated as linear, though points rarely lie exactly on a line due to random factors. The model is written as

y = β₀ + β₁x + ε , where β₀ and β₁ are unknown constants (intercept and slope), and ε represents random error following a normal distribution.

Least‑Squares Estimation of Parameters β₀ and β₁

To estimate the regression coefficients, the least‑squares method minimizes the sum of squared differences between observed values and those predicted by the line. Setting the partial derivatives of this sum with respect to β₀ and β₁ to zero yields the normal equations, whose solution gives the estimates

β̂₁ = Σ(xᵢ−x̄)(yᵢ−ȳ) / Σ(xᵢ−x̄)² and β̂₀ = ȳ − β̂₁x̄ .

Correlation Test and Coefficient of Determination (Goodness of Fit)

The purpose of fitting a regression model is to explain the variation in y using the linear function of x . The proportion of total variation explained by the model is measured by the coefficient of determination R² = SSR / SST, where SSR is the regression sum of squares and SST is the total sum of squares.

R² close to 1 indicates that the regression explains most of the variation, meaning a good fit.

R² close to 0 indicates a poor fit, with most variation unexplained by the model.

R² also reflects the linear correlation between the variables: a higher absolute correlation coefficient corresponds to a larger R².

Significance Test of the Regression Equation

To verify whether the true relationship is linear, we test the null hypothesis H₀: β₁ = 0 against the alternative H₁: β₁ ≠ 0. Using the F‑statistic, we compare the calculated value with critical values at chosen significance levels (e.g., 0.01, 0.05). If the statistic exceeds the critical value, the linear relationship is deemed significant.

Case Study

Moderate wine consumption may reduce heart disease risk. The table below shows, for 19 countries, the average annual alcohol intake from wine (liters) and the corresponding heart‑disease mortality rate (deaths per 100,000 people).

Scatter plot of the 19 points shows an approximate linear trend, justifying the use of simple linear regression to estimate the coefficients.

Using Python's numpy.polyfit or scipy.optimize.curve_fit , the fitted regression equation is obtained, allowing prediction of heart‑disease mortality for a given alcohol intake.

Statistical software such as statsmodels can compute additional diagnostics, offering both formula‑based and array‑based interfaces for regression analysis.

References

Shi Shou‑kui, Sun Xi‑jing. Python Mathematics Experiments and Modeling .

hypothesis testingstatistical modelinglinear regressionleast squaresR-squared
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.