How the Delta Method Improves AB Test Variance Estimation When Units Differ
This article explains why traditional hypothesis‑testing methods can mis‑estimate variance when the splitting unit and analysis unit differ in AB experiments, introduces the Delta Method as an unbiased variance estimator, compares it with Bootstrap and other corrections through simulations and real‑world case studies, and highlights its computational efficiency.
Introduction
In freight AB experiments, the splitting unit (e.g., order ID) often differs from the analysis unit (e.g., transaction unit). Using standard hypothesis‑testing methods in such cases leads to incorrect variance calculation, distorting test statistics and p‑values.
Theoretical Basis
The article proposes using the Delta Method to obtain an unbiased variance estimate. By constructing a linear approximation of the statistic through a continuously differentiable function, the Delta Method yields a reliable test statistic even when units are inconsistent.
Formula (multivariate case):
Application to GTV Pairing Rate
GTV pairing rate is defined as GTV pairing rate = paired GTV / executed GTV . In an order‑ID split experiment, the metric’s variance is mis‑estimated by the original method, leading to inflated type‑I error.
Using the Delta Method, the variance is correctly estimated, producing a more accurate test statistic.
Bootstrap is also introduced as a non‑parametric alternative that resamples with replacement to build an empirical distribution of the statistic.
Simulation Comparison
Simulated AA experiments (1000 runs) compare four methods: original, unit‑adjusted, Delta Method, and quantile Bootstrap. Results show that the Delta Method controls type‑I error similarly to Bootstrap but with far lower computational cost.
Key findings:
Original method severely underestimates variance, causing excessive type‑I error.
Delta Method and unit‑adjusted method produce non‑significant results, with Delta Method offering larger, more realistic variance.
Bootstrap is accurate but computationally intensive.
Real‑World Case Studies
Case 1 – Order‑ID split (GTV pairing rate) : The experiment shows a +0.4 p.p. increase. Using the Delta Method confirms the increase is statistically significant while controlling type‑I error.
Case 2 – User‑ID split (order pairing rate) : The experiment shows a –2 p.p. change and a 0.6 % increase in order volume. The Delta Method again provides a reliable significance assessment.
Summary
Accurate variance estimation is crucial for reliable AB test conclusions. When splitting and analysis units differ, both Bootstrap and the Delta Method yield scientifically sound results, but the Delta Method achieves comparable accuracy with substantially lower computational overhead, making it preferable for large‑scale data.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
