Operations 9 min read

How to Eliminate Pre‑Experiment Bias and Find the Optimal AB Test Grouping

This article explains how pre‑experiment bias can distort AB test results and introduces a suite of techniques—including AA retrospective analysis, SeedFinder optimal random grouping, variance reduction, and an offline splitting algorithm—to create homogeneous test groups and improve experiment reliability.

Huolala Tech

Jan 19, 2024

How to Eliminate Pre‑Experiment Bias and Find the Optimal AB Test Grouping

Introduction

In many AB experiments at Huolala, coarse segmentation or insufficient experimental units cause large pre‑experiment bias (Pre‑Experiment Bias) between treatment and control groups.

Pre‑Experiment Bias and Mitigation Techniques

Microsoft’s Edge experiment showed that historical data can reveal and address pre‑experiment bias, reducing false‑positive results. Three key techniques are introduced:

AA Retrospective – compute metric differences between groups before the experiment to detect bias.

SeedFinder (Optimal Random Grouping) – generate many random seeds, evaluate group homogeneity on core metrics, and select the seed with minimal difference.

Variance Reduction – increase metric sensitivity and further diminish pre‑experiment differences.

Offline Splitting Concept

Building on SeedFinder, an offline splitting method groups users or drivers into multiple cohorts with minimal historical differences, leveraging the correlation between past and experimental behavior to improve AB test reliability.

Step‑by‑Step Workflow

Define Experiment Subjects – ensure subjects satisfy SUTVA assumptions and choose appropriate splitting units (user, driver, or space).

Set Splitting Parameters – decide number of groups, traffic ratios, core metrics, historical data windows, homogeneity criteria, random seed range, and, for spatiotemporal splits, the rotation order.

Generate Splitting Schemes – use random search or operations‑research optimization (e.g., genetic algorithms) to create candidate schemes, evaluate homogeneity on training data, and retain those meeting the criteria.

Evaluate Designs – on test dates, output relative metric differences and p‑values for each viable scheme, allowing users to select the best one.

Case Study

A comparison between simple random sampling and controlled pre‑experiment difference sampling shows that the latter yields tighter distributions around zero, indicating higher homogeneity and more reliable experimental outcomes.

Summary

Pre‑experiment bias can severely affect AB test validity. By combining AA retrospective analysis, SeedFinder optimal random grouping, variance reduction, and offline splitting, Huolala achieved more reliable experiments and better evaluation of intervention effects.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AB testing variance reduction offline splitting pre-experiment bias random seed optimization

Written by

Huolala Tech

Technology reshapes logistics

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.