Fundamentals 8 min read

How the Delta Method Improves AB Test Variance Estimation When Units Differ

This article explains why traditional hypothesis‑testing methods can mis‑estimate variance when the splitting unit and analysis unit differ in AB experiments, introduces the Delta Method as an unbiased variance estimator, compares it with Bootstrap and other corrections through simulations and real‑world case studies, and highlights its computational efficiency.

Huolala Tech

Feb 2, 2024

How the Delta Method Improves AB Test Variance Estimation When Units Differ

Introduction

In freight AB experiments, the splitting unit (e.g., order ID) often differs from the analysis unit (e.g., transaction unit). Using standard hypothesis‑testing methods in such cases leads to incorrect variance calculation, distorting test statistics and p‑values.

Theoretical Basis

The article proposes using the Delta Method to obtain an unbiased variance estimate. By constructing a linear approximation of the statistic through a continuously differentiable function, the Delta Method yields a reliable test statistic even when units are inconsistent.

Formula (multivariate case):

Application to GTV Pairing Rate

GTV pairing rate is defined as GTV pairing rate = paired GTV / executed GTV . In an order‑ID split experiment, the metric’s variance is mis‑estimated by the original method, leading to inflated type‑I error.

Using the Delta Method, the variance is correctly estimated, producing a more accurate test statistic.

Bootstrap is also introduced as a non‑parametric alternative that resamples with replacement to build an empirical distribution of the statistic.

Simulation Comparison

Simulated AA experiments (1000 runs) compare four methods: original, unit‑adjusted, Delta Method, and quantile Bootstrap. Results show that the Delta Method controls type‑I error similarly to Bootstrap but with far lower computational cost.

Key findings:

Original method severely underestimates variance, causing excessive type‑I error.

Delta Method and unit‑adjusted method produce non‑significant results, with Delta Method offering larger, more realistic variance.

Bootstrap is accurate but computationally intensive.

Real‑World Case Studies

Case 1 – Order‑ID split (GTV pairing rate) : The experiment shows a +0.4 p.p. increase. Using the Delta Method confirms the increase is statistically significant while controlling type‑I error.

Case 2 – User‑ID split (order pairing rate) : The experiment shows a –2 p.p. change and a 0.6 % increase in order volume. The Delta Method again provides a reliable significance assessment.

Summary

Accurate variance estimation is crucial for reliable AB test conclusions. When splitting and analysis units differ, both Bootstrap and the Delta Method yield scientifically sound results, but the Delta Method achieves comparable accuracy with substantially lower computational overhead, making it preferable for large‑scale data.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AB testing hypothesis testing statistical methods Bootstrap variance estimation Delta Method

Written by

Huolala Tech

Technology reshapes logistics

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.