Fundamentals 16 min read

Why Linear Regression Is Surprisingly Powerful for Causal Inference

This article explains how linear regression can be used to estimate average causal effects, handle bias, and draw valid conclusions from both randomized experiments and observational data, while illustrating the theory with concrete examples and visualizations.

Model Perspective
Model Perspective
Model Perspective
Why Linear Regression Is Surprisingly Powerful for Causal Inference

Linear Regression's Surprising Effectiveness

All You Need Is Regression

When dealing with causal inference we consider two potential outcomes for each individual: the result if they do not receive the intervention and the result if they do. Setting the treatment variable to 0 or 1 forces one of these outcomes to materialize, leaving the other unobservable and making the individual treatment effect unknowable.

Therefore we focus on estimating the average causal effect (ATE), a simpler task that asks whether, on average, the intervention is effective. We accept that some people respond better than others, but we cannot identify who they are; we only ask whether the average effect is positive.

If the average effect is positive, we conclude that the intervention has a beneficial impact overall, even though a few individuals may react negatively.

Because of bias, we cannot simply use the difference in means to estimate the effect when other factors influence both treatment and outcome. Randomized controlled trials (RCTs) eliminate this bias by ensuring that treated and untreated groups are statistically identical.

We can use linear regression as the main tool for causal inference. Specifically, we estimate a model where the treatment indicator (0 for face‑to‑face teaching, 1 for online) predicts the outcome. The regression coefficient on the treatment variable is the ATE, and the intercept gives the mean outcome for the control group.

This approach not only yields the ATE but also provides confidence intervals and p‑values. The regression coefficient on the treatment variable equals the difference in means, while the intercept equals the control group mean.

Regression Theory

Linear regression minimizes mean‑squared error (MSE) to find the best linear predictor. The optimal coefficient vector β satisfies the normal equations, which can be expressed in closed form. When there is only one regressor (the treatment), the coefficient directly estimates the causal effect if the regressor is randomly assigned.

With multiple regressors, the coefficient on the treatment variable represents the effect of the treatment after holding all other covariates constant. This is achieved by regressing the treatment on the other covariates, taking the residual, and then regressing the outcome on that residual.

Regression with Non‑Random Data

Randomized data are often unavailable, so we turn to observational data. As an example, we estimate the effect of an additional year of education on hourly wages using a log‑wage model. The simple regression yields a coefficient of 0.0536 (95% CI: 0.039–0.068), suggesting a 5.3% wage increase per extra year of schooling.

However, this estimate may be biased because education is not randomly assigned; higher‑educated individuals may differ in unobserved ways (e.g., parental wealth, innate ability). To address this, we include additional covariates such as parents' education, IQ, experience, tenure, marital status, and race.

After controlling for these factors, the coefficient on education drops to 0.0411, indicating a 4.11% wage increase per additional year of schooling for individuals with the same IQ, experience, tenure, etc. This demonstrates that the simple model was upwardly biased.

Omitted Variable Bias

If a relevant variable is omitted, the estimated coefficient on education equals the true effect plus a bias term: the effect of the omitted variable on the outcome multiplied by the correlation between the omitted variable and education.

When the omitted variable has no impact on the outcome or on education, the bias term is zero. Otherwise, bias arises, often because the omitted factor influences both treatment and outcome (a confounder).

In our wage example, IQ is a confounder: higher IQ leads to more education and higher wages. Failing to control for IQ inflates the estimated education effect (positive bias). Similar logic applies to negative bias scenarios.

Graphical causal models help visualize these relationships, showing how RCTs cut the link between confounders and treatment, while regression adjusts for confounders by holding them constant.

Key Takeaways

We have covered how regression can be used for A/B testing, provide confidence intervals, serve as the best linear approximation to causal effects, and yield unbiased estimates when all confounders are included. Omitted variable bias arises when a confounder affecting both treatment and outcome is left out, which can be visualized with causal diagrams.

Source: https://github.com/xieliaing/CausalInferenceIntro

causal inferenceobservational datalinear regressionaverage treatment effectomitted variable bias
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.