Why Causal Relationships Matter: From Prediction to Counterfactuals
Understanding why causal relationships matter reveals the limits of predictive machine learning, introduces counterfactual reasoning, explains potential outcomes, treatment effects, bias, and how to distinguish correlation from causation using simple examples like tablet distribution in schools.
Why Causal Relationships Matter
Machine learning excels at prediction, but it cannot answer counterfactual "what‑if" questions that require causal reasoning. As Ajay Agrawal, Joshua Gans, and Avi Goldfarb note in *Prediction Machine*, AI’s current wave provides powerful prediction tools, not true intelligence.
Prediction works when the problem can be framed as estimating a future outcome (e.g., translating English to Portuguese, detecting faces, or steering an autonomous car). However, ML fails when the data distribution shifts or when we need to know the effect of an intervention.
Such "inverse‑causality" problems demand counterfactual reasoning: what would happen if we changed a price, a diet, a credit limit, or gave every child a tablet? These questions are fundamentally causal and cannot be solved by correlation‑based prediction alone.
Every decision—whether about business revenue, education policy, immigration, or personal choices—poses a causal question. Unfortunately, causal questions are harder to answer because they involve unobservable potential outcomes.
When Correlation Is Causal
Intuitively we know that correlation does not imply causation. For example, schools that provide tablets often have more resources, so their better performance may be due to wealth, not the tablets themselves.
To discuss causality formally we introduce notation: let t denote the treatment (e.g., giving a tablet), and y the observed outcome (e.g., test score). The fundamental problem of causal inference is that we can never observe both the treated and untreated outcome for the same individual.
Two roads diverged in a yellow wood, Sorry I could not travel both And be one traveler, I stood long And looked as far as I could Till the path vanished among the brush;
We therefore consider potential outcomes : y₀ (outcome without treatment) and y₁ (outcome with treatment). The observed outcome is either y₀ or y₁ , never both.
The individual treatment effect is τᵢ = y₁ᵢ - y₀ᵢ , which is unobservable. Instead we estimate aggregate quantities such as the average treatment effect (ATE) :
E[y₁ - y₀]
and the average treatment effect on the treated (ATT) :
E[y₁ - y₀ \mid t = 1]
Consider a toy dataset of four schools, where t indicates whether tablets were provided and y is the test score:
i
y₀
y₁
t
y
te
0
500
450
0
500
-50
1
600
600
0
600
0
2
800
600
1
600
-200
3
700
750
1
750
50
The average of the last column (te) is –50, meaning that, on average, tablets would reduce scores by 50 points in this artificial example. The ATT (average effect among treated schools) is –75.
In reality we cannot observe the missing potential outcomes, so naïvely comparing treated and untreated averages conflates causal effect with bias.
Bias
Bias arises when the treated and untreated groups differ in ways other than the treatment itself. In the tablet example, richer schools are more likely to receive tablets, so their higher scores may stem from wealth, not the tablets.
Formally, the observed difference between groups equals the true treatment effect plus a bias term that captures pre‑treatment differences:
Observed Difference = ATT + Bias
If the groups are comparable before treatment (i.e., the bias term is zero), the observed difference equals the causal effect.
Randomized assignment eliminates bias because treatment is independent of other factors, making the simple difference in means an unbiased estimate of the average causal effect.
Visual illustrations (omitted here) show how bias and treatment effects combine, and how randomization removes the bias component.
Understanding these concepts is the first step toward mastering methods that identify causal relationships and eliminate bias.
Next, we will explore basic techniques for estimating causal effects, starting with randomized experiments as the gold standard.
Source: https://github.com/xieliaing/CausalInferenceIntro
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.