Artificial Intelligence 26 min read

Observational Data Causal Inference: Fundamentals, Quasi‑Experimental Methods, and Tencent Case Studies

This article provides a comprehensive overview of causal inference on observational data, explaining confounding and collider structures, experimental solutions, the differences between observational and experimental data, challenges such as Simpson's paradox, and detailed Tencent case studies using DID, regression discontinuity, and uplift modeling to guide practical analysis.

DataFunTalk
DataFunTalk
DataFunTalk
Observational Data Causal Inference: Fundamentals, Quasi‑Experimental Methods, and Tencent Case Studies

The presentation introduces the topic of causal inference on observational data, aiming to give a complete understanding of its fundamentals, current challenges, and practical solutions.

1. Basic Knowledge of Observational Causal Inference – Causal relationships are a subset of correlations; confounding and collider structures can create spurious correlations. Examples such as "wearing shoes while sleeping" versus "headache next morning" illustrate how hidden variables (e.g., drinking alcohol) generate misleading associations.

2. Solutions – Experiments break the dependence on confounding parents by randomizing treatment assignment, allowing the average treatment effect (ATE) to equal the observed correlation when randomization holds.

3. Observational vs. Experimental Data – A comparison using the same shoe‑sleep example shows how observational data can be biased by imbalanced confounders, while a randomized experiment reveals the true null causal effect.

4. Limitations of Experiments – Ethical, technical, or historical constraints often prevent true experiments, necessitating causal inference on observational data.

5. Challenges in Observational Causal Inference – Issues such as Simpson’s paradox, omitted confounders, and collider bias are illustrated with the smoking‑lung‑cancer example.

6. Overall Causal Inference Framework – The framework positions quasi‑experimental methods (DID, instrumental variables, regression discontinuity) before propensity‑score matching (PSM) and confounder‑controlled PSM, emphasizing the preference for methods that avoid confounding.

7. Quasi‑Experimental Case Studies at Tencent

• DID Weather‑Info Analysis : Using a natural experiment on extreme weather (Typhoon Hagibis) to assess the causal impact of weather news exposure on user retention, revealing a 1.4% uplift after correcting for selection bias.

• Regression Discontinuity in Novel Business : Identifying a breakpoint at ~115 seconds of first‑day reading time, showing that increasing first‑chapter completion causally improves new‑user retention.

• Startup‑Reset Problems : Defining three questions (short‑term impact, long‑term impact, user heterogeneity) and proposing a unified observational analysis pipeline using breakpoint regression for short‑term effects and constructed quasi‑experimental variables for long‑term and heterogeneous effects.

8. Analysis Pipeline for Startup‑Reset – Short‑term impact is estimated via regression discontinuity around a 40‑minute visit‑interval threshold; long‑term impact uses propensity‑score matching; heterogeneity analysis combines uplift modeling with CatBoost after transforming outcomes (Y* and G*).

9. Algorithmic Steps – (1) Transform outcomes and fit CatBoost models; (2) Extract top features; (3) Classify users into four uplift quadrants and compare feature means; (4) Perform single‑dimensional searches to obtain quantitative uplift and confidence.

10. Uplift Modeling Insights – Comparing Transform‑outcome + CatBoost against other uplift methods shows the former achieves the highest Gini score (0.1387), roughly double the baseline.

11. Conclusions – Both short‑term and long‑term startup‑reset strategies have side effects; user‑level heterogeneity analysis enables selective deployment (e.g., keep strategy for low‑search‑activity users, remove for high‑activity users). Recommendations include UI cues for context restoration and refined targeting based on uplift results.

The talk ends with a Q&A covering uplift vs. correlation, p‑value computation, DID vs. A/B testing, and practical tips for building unbiased uplift models.

machine learningcausal inferenceobservational datauplift modelingDIDquasi-experiment
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.