Causal Inference and Experiment Design in Kuaishou Live Streaming

This article presents Dr. Jin Yaran’s comprehensive overview of causal inference challenges, frameworks, and practical case studies—including DID, double machine learning, causal forests, and meta‑learners—applied to Kuaishou’s live‑streaming product, and discusses complex experimental designs such as bilateral and time‑slice experiments.

DataFunTalk
DataFunTalk
DataFunTalk
Causal Inference and Experiment Design in Kuaishou Live Streaming

Guest Speaker: Dr. Jin Yaran, Economist at Kuaishou (DataFunTalk).

01. Causal Inference Problems and Technical Framework in Kuaishou Live

Kuaishou faces four main problems: user incentive design, recommendation strategy evaluation, product feature iteration, and long‑term value estimation. Solutions include (1) causal inference from observational data, (2) well‑designed A/B experiments, and (3) combining economic models, machine‑learning algorithms, and experiments to conduct counterfactual reasoning.

The core of causal inference is to separate causal relationships from mere correlations, estimate the effect size, and validate the inference statistically.

Rubin Potential‑Outcome Model : Finds suitable control groups to estimate unobserved treatment effects, often using RCTs, A/B tests, or matching methods on observational data.

Pearl Causal‑Graph Model : Uses directed graphs to describe variable relationships; conditional distributions derived from the graph remove estimation bias.

Both frameworks are complementary: Rubin focuses on estimating average treatment effects, while Pearl emphasizes identifying causal structures.

02. Causal Inference Techniques on Observational or Experimental Data

1. Product Feature Evaluation – DID and Extensions

Difference‑in‑Differences (DID) handles unobservable fixed effects under a parallel‑trend assumption. Extensions include correcting for heterogeneous treatment timing and using synthetic control groups when a single control is unavailable.

2. Recommendation Strategy Evaluation – Causal Inference + Machine Learning

Machine‑learning models excel at prediction but still require causal identification. Double Machine Learning (DML) orthogonalizes high‑dimensional confounders to obtain unbiased treatment effect estimates.

Key steps: split data into training (to build trees) and estimation sets (to compute causal effects), and apply sample‑splitting & cross‑fitting for bias correction.

3. Causal Forests

Decision‑tree based causal forests estimate heterogeneous treatment effects by constructing trees on a training set and evaluating effects on a separate estimation set, adjusting node splits with variance‑based objectives.

4. Meta‑Learner for Uplift Modeling

Uplift modeling (S‑Learner, T‑Learner, X‑Learner) estimates conditional average treatment effects directly from experimental data; Meta‑Learners provide fast indirect modeling but may have higher error in some scenarios.

03. Complex Experiment Designs

Network effects in live streaming require advanced designs such as bilateral experiments, time‑slice rotation, and optimal design.

1. Bilateral Experiments : Simultaneously split both anchors (streamers) and viewers, allowing detection of spillover effects and more accurate attribution.

2. Time‑Slice Rotation Experiments : Repeatedly switch treatment and control periods for the same group, balancing time‑slice selection, total experiment duration, and random switch timing.

Optimal design assumes a bounded outcome, users cannot predict treatment periods, and any interference between slices is limited and fixed.

04. Q&A Session

Questions covered differences between DID and A/B testing, distinctions between double machine learning and propensity‑score matching, handling violations of the CIA assumption, and whether causal graphs are pre‑specified or learned.

Overall, the talk illustrated how Kuaishou integrates causal inference theory, statistical methods, and machine‑learning tools to evaluate product changes, understand user behavior, and design robust experiments in a live‑streaming environment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

machine learningdata analysisA/B testingcausal inferenceexperiment designKuaishou
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.