Key AB Testing Interview Questions and Answers for Data Science Candidates
The article reviews common AB‑testing interview questions for data‑science candidates, explaining the role of p‑values, Type I/II errors, the difference between statistical and business significance, why effects can vanish when scaling, and how to improve experiment sensitivity through larger samples, variance‑reduction methods, and careful metric design.
As AB testing becomes increasingly important in decision‑making, interview expectations for AB testing skills have also risen. This article, based on learning approaches and practical experience from Didi’s scientific platform, summarizes several interview‑style questions that are commonly asked of junior to mid‑level data science candidates.
Q: Why do we look at the P‑value in an AB experiment?
Answer: The P‑value is a statistical metric that assesses the significance of observed differences between the control (A) and treatment (B) groups. It helps answer whether the difference is due to random chance, whether the effect size is practically important, whether there is enough evidence to reject the null hypothesis, and provides a standardized way to compare metrics across experiments.
Q: What are Type I and Type II errors? Which is more serious in AB testing, and can they be reduced simultaneously?
Answer: Type I error (false positive) occurs when the null hypothesis is true but incorrectly rejected. Type II error (false negative) occurs when the alternative hypothesis is true but not detected. Generally, Type II error is considered more serious because it means missing a beneficial strategy. Reducing one error typically increases the other; trade‑offs can be managed by increasing sample size, reducing variance (e.g., stratification, CUPED), or improving experiment triggering.
The discussion also highlights the distinction between statistical significance and business significance, illustrating that a statistically significant uplift (e.g., ¥57 increase with P‑value = 0.001) may be meaningless if the ROI is negative.
Q: An experiment runs on 1 % traffic, shows a significant core metric, and is rolled out to 100 % with a 5 % holdout, but the effect disappears. Why?
Answer: Potential reasons include lack of AA testing leading to bucket bias, carry‑over effects, uneven traffic allocation, violation of the SUTVA assumption, and differences between the experimental traffic slice and the overall population. These factors can cause the observed effect to vanish when the experiment is scaled.
Q: What is MDE and how can we improve AB experiment sensitivity?
Answer: MDE (Minimum Detectable Effect) is the smallest effect size that can be reliably detected at a given significance level. Sensitivity can be improved by either increasing sample size or reducing variance. Practical variance‑reduction techniques include data cleaning, trigger analysis, careful metric selection, and using methods like CUPED or stratified sampling.
The article concludes with a reminder that understanding both statistical and business significance is crucial for effective experiment evaluation.
Didi Tech
Official Didi technology account
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.