How to Eliminate Pre‑Experiment Bias and Find the Optimal AB Test Grouping
This article explains how pre‑experiment bias can distort AB test results and introduces a suite of techniques—including AA retrospective analysis, SeedFinder optimal random grouping, variance reduction, and an offline splitting algorithm—to create homogeneous test groups and improve experiment reliability.
Introduction
In many AB experiments at Huolala, coarse segmentation or insufficient experimental units cause large pre‑experiment bias (Pre‑Experiment Bias) between treatment and control groups.
Pre‑Experiment Bias and Mitigation Techniques
Microsoft’s Edge experiment showed that historical data can reveal and address pre‑experiment bias, reducing false‑positive results. Three key techniques are introduced:
AA Retrospective – compute metric differences between groups before the experiment to detect bias.
SeedFinder (Optimal Random Grouping) – generate many random seeds, evaluate group homogeneity on core metrics, and select the seed with minimal difference.
Variance Reduction – increase metric sensitivity and further diminish pre‑experiment differences.
Offline Splitting Concept
Building on SeedFinder, an offline splitting method groups users or drivers into multiple cohorts with minimal historical differences, leveraging the correlation between past and experimental behavior to improve AB test reliability.
Step‑by‑Step Workflow
Define Experiment Subjects – ensure subjects satisfy SUTVA assumptions and choose appropriate splitting units (user, driver, or space).
Set Splitting Parameters – decide number of groups, traffic ratios, core metrics, historical data windows, homogeneity criteria, random seed range, and, for spatiotemporal splits, the rotation order.
Generate Splitting Schemes – use random search or operations‑research optimization (e.g., genetic algorithms) to create candidate schemes, evaluate homogeneity on training data, and retain those meeting the criteria.
Evaluate Designs – on test dates, output relative metric differences and p‑values for each viable scheme, allowing users to select the best one.
Case Study
A comparison between simple random sampling and controlled pre‑experiment difference sampling shows that the latter yields tighter distributions around zero, indicating higher homogeneity and more reliable experimental outcomes.
Summary
Pre‑experiment bias can severely affect AB test validity. By combining AA retrospective analysis, SeedFinder optimal random grouping, variance reduction, and offline splitting, Huolala achieved more reliable experiments and better evaluation of intervention effects.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
