Paired Data Based A/B Experiments: Causal Inference in Network Experiments
The DataFun Data Science Summit on May 25 will feature Tencent data scientist Li Yilin presenting a comprehensive overview of paired‑data A/B experiments, covering causal inference challenges, unbiased estimators under various randomization designs, theoretical analysis, and practical insights for network‑based online experiments.
On May 25, the DataFun‑produced Data Science Summit will bring together eight experts and producers to share the latest practices in data science, with a live broadcast and QR‑code registration for interested participants.
One of the featured speakers is Li Yilin, a data scientist at Tencent. Li is a Ph.D. candidate in Statistics at Peking University, focusing on causal inference, especially in the presence of interference, and observational data analysis. He works on the WeChat experiment platform, and his research has been published in venues such as Biometrics, ACM/IMS Journal of Data Science, and ICML.
Talk Title: Paired Data Based A/B Experiments
Talk Outline: Paired data, a unique data type describing interactions between two entities, enables deeper analysis of complex relationships in fields ranging from international trade to social network communication. With the rise of big data, interest in causal inference for paired data has grown, yet methodological research remains scarce. Traditional causal inference assumes the Stable Unit Treatment Value Assumption (SUTVA), which often fails in networked settings due to interference, leading to biased estimates of global average treatment effects. By incorporating paired outcomes into randomized experiments—where subjects are assigned to treatment or control—we encounter scenarios common in online A/B testing (e.g., message forwarding, link sharing). A novel paired interference assumption is introduced, and it is shown that unbiased global average treatment effect estimators based on unit‑level outcomes generally do not exist under heterogeneity. Leveraging the structure of paired data, we design unbiased estimators for the global causal effect and prove they are unbiased under various randomization schemes (Bernoulli, complete, and cluster randomization). Comprehensive theoretical analysis covers convergence rates, connections to network structure, and asymptotic normality via Stein’s method. Confidence interval construction for Bernoulli randomization and associated statistical inference methods are also provided. Extensive numerical experiments validate the estimators’ accuracy and demonstrate their application to large‑scale online randomized controlled trials.
Audience Benefits:
Understanding the methods available for estimating global causal effects in network experiments.
Learning what paired data analysis entails.
Grasping how to conduct A/B experiments and causal inference with paired data, including the underlying theory and existing challenges.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.