Fundamentals 16 min read

Why Understanding Causal Relationships Is Crucial for Machine Learning

This article explains why causal inference matters beyond prediction, introduces potential outcomes notation, demonstrates how bias separates correlation from causation, and outlines the conditions under which observed differences can be interpreted as true causal effects.

Model Perspective
Model Perspective
Model Perspective
Why Understanding Causal Relationships Is Crucial for Machine Learning

Why Causal Relationships Matter

Machine learning excels at prediction, but turning a problem into a prediction task does not automatically yield insight into cause‑and‑effect. Predictive models can fail when data shift or when the question actually requires counterfactual reasoning, such as estimating the impact of price changes, diet adjustments, or credit limits.

Causal questions ask "what would happen if we changed X?" and cannot be answered by correlation‑based predictions alone.

When Correlation Is Causation

Intuitively we know that correlation does not imply causation. For example, schools that give tablets to students often have more resources, so better test scores may stem from wealth rather than the tablets themselves.

We introduce symbols: t_i for the treatment (e.g., providing a tablet), y_i for the observed outcome (e.g., test score), and potential outcomes y_i(0) (no treatment) and y_i(1) (treatment). The fundamental problem of causal inference is that we can observe only one of the two potential outcomes for each individual.

Two roads diverged in a yellow wood, … I could not travel both.

Potential outcomes allow us to define the individual treatment effect as the difference between the treated and untreated potential outcomes, though it is unobservable.

We therefore focus on estimable quantities such as the Average Treatment Effect (ATE) and the Treatment‑Group Average Treatment Effect (ATT), defined using expectations E[·] .

Example data for four schools illustrate how the observed average difference can be misleading if we ignore the unobserved counterfactuals.

Bias

Bias arises when the treated and untreated groups differ in ways other than the treatment itself. In the tablet example, richer schools are more likely to receive tablets, so the observed correlation conflates the effect of wealth with the effect of tablets.

Mathematically, the observed difference equals the true causal effect plus a bias term that captures pre‑treatment differences between groups.

If the groups are comparable before treatment (i.e., no systematic differences), the bias term vanishes and the observed difference equals the causal effect.

Actual treatment effect (e.g., tablets improving scores).

Other differences between groups (e.g., higher tuition, better teachers).

Random assignment eliminates bias, making the treated‑control difference a valid estimate of the causal effect.

Key Ideas

We have shown that correlation is not causation, introduced potential‑outcome notation, and explained why bias must be addressed to infer causality. The next steps involve learning methods—starting with randomized experiments—that help estimate causal effects reliably.

Source: https://github.com/xieliaing/CausalInferenceIntro

Machine Learningcausal inferencetreatment effectpredictionbiaspotential outcomes
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.