Causal Inference for Machine Learning: Paradigms, Differentiable Discovery, and OOD Applications
The article reviews the limitations of association‑based AI, explains the two main causal inference paradigms, introduces differentiable causal discovery, and shows how these ideas address out‑of‑distribution challenges and stable learning in recommendation systems, citing recent research.
Background
Artificial intelligence is increasingly used in risk‑sensitive domains, but current models rely on associative statistics, leading to sample‑selection bias, poor stability, lack of explainability, fairness issues, and non‑retraceable decisions. The author argues that these problems stem from using correlation instead of causation.
Two Basic Paradigms of Causal Inference
Structure Causal Model (SCM)
SCM treats causality as a directed acyclic graph (DAG). Inference uses criteria such as the Back‑Door and Front‑Door rules and Do‑Calculus to estimate causal effects. The main difficulty is that the causal graph is rarely known in observational studies, turning causal discovery into a bottleneck.
A derived technique called Causal Discovery builds the graph from conditional independence tests on observed data. This is an NP‑hard problem that can suffer from combinatorial explosion. Recent work proposes differentiable causal discovery to alleviate this issue.
Potential Outcome Framework (POF)
POF does not require a full causal graph; it only needs to know whether a specific treatment variable influences the outcome, assuming all confounders are observed. This framework focuses on estimating the effect of a single variable while ignoring other causal relations.
Differentiable Causal Discovery and Its Application in Recommendation Systems
The author defines Functional Causal Models (FCMs) where each variable is a function of its parents plus noise. For linear FCMs, the goal is to find a weight matrix that minimizes reconstruction error.
Reference [1] (Zheng et al., 2018) introduces a continuous optimization method for DAG learning (NO TEARS) that adds DAG and sparsity constraints (ℓ₁/ℓ₂ regularization) to reduce reconstruction error. However, the method assumes Gaussian noise of similar scale; violating this assumption can lead to incorrect structures (ground‑truth mismatch). The author notes that adding an independence constraint can mitigate this limitation, as discussed in reference [2] (He et al., 2021).
In recommendation systems, the IID assumption often fails, causing natural shifts (e.g., models trained on Beijing data performing poorly in Chongqing) and artificial shifts introduced by the recommendation mechanism. The author proposes a causal‑inspired stable learning approach that seeks invariant user preferences across environments. By treating invariant structures as causal, the method transforms preference learning into a causal discovery problem, enabling more explainable and stable recommendations. Empirical comparisons (see Figure 11‑13) show noticeable performance gains over existing baselines, and the approach is detailed in reference [3] (He et al., 2022).
Thoughts on OOD Generalization and Stable Learning
Out‑of‑Distribution (OOD) problems arise when training and test distributions differ. OOD can be categorized as adaptation (test distribution partially known) or true generalization (test distribution unknown). The author distinguishes this from the usual ML notion of “generalization,” which concerns interpolation rather than extrapolation.
Two paths to achieve OOD robustness are presented:
Leverage causal inference, since causal structures are invariant across environments; learning based on causality thus yields stable models.
Exploit heterogeneity in data to discover invariant sub‑structures, akin to finding common patterns across heterogeneous clusters (e.g., dogs on beach vs. grass).
Stable learning aims to minimize performance variance across multiple unknown test distributions, assuming the training data contain intrinsic heterogeneity. The author’s recent survey [4] (Shen et al., 2021) provides a systematic analysis of OOD generalization methods.
References
[1] X. Zheng et al., “DAGs with NO TEARS: Continuous Optimization for Structure Learning,” NeurIPS 2018.
[2] Y. He, P. Cui et al., “DARING: Differentiable Causal Discovery with Residual Independence,” KDD 2021.
[3] Y. He et al., “CausPref: Causal Preference Learning for Out‑of‑Distribution Recommendation,” The WebConf 2022.
[4] Z. Shen et al., “Towards Out‑Of‑Distribution Generalization: A Survey,” arXiv 2021.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
