Fundamentals 25 min read

Mastering Switchback Experiments: Random Rotation Methods for Reliable A/B Testing

This article, the fourth in the Trusted Experiment Whitepaper series, explains coin‑flip, complete, and paired random rotation designs, their grouping mechanisms, evaluation principles, and how to handle spillover and carryover effects to improve the reliability and power of AB experiments.

Meituan Technology Team

Jun 5, 2025

Mastering Switchback Experiments: Random Rotation Methods for Reliable A/B Testing

This article is the fourth in the "Trusted Experiment Whitepaper" series. The previous article introduced basic concepts of randomized controlled experiments and common methods to increase experimental power. This piece focuses on random rotation experiments, covering coin‑flip random rotation, complete random rotation, and paired random rotation.

Chapter Contents

4.1 Coin Flip Random Rotation

4.1.1 Overview

4.1.2 Grouping Mechanism

4.1.3 Evaluation Principle

4.2 Complete Random Rotation

4.2.1 Overview

4.2.2 Grouping Mechanism

4.2.3 Evaluation Principle

4.2.4 Stratified Random Rotation

4.3 Paired Random Rotation

4.3.1 Overview

4.3.2 Grouping Mechanism

4.3.3 Evaluation Principle

4.4 Extensions and Outlook

4.4.1 Handling Abnormal Scenarios

4.4.2 Carryover Effects at Hourly Granularity

4.4.3 Other Rotation Designs

Switchback Experiment (time‑based randomization) repeatedly switches experimental units between treatment and control periods, allowing detection of effects while mitigating spillover effects and sample‑size limitations in AB testing.

Spillover Effect : Violation of the SUTVA assumption when units influence each other (e.g., via social networks or shared resources), causing biased effect estimates. Time‑slice rotation across a city can eliminate spatial spillover.

Insufficient Sample Size : Combining time‑slice rotation with randomization increases independent samples, enhancing experimental efficiency.

Because of these characteristics, switchback experiments are widely used in fulfillment scenarios, but they should not be applied when the experimental strategy is perceptible to users.

4.1 Coin Flip Random Rotation

4.1.1 Overview

When sample size is limited, adding time‑slice rotation can increase independent samples. In a coin‑flip random rotation, each experimental unit and time slice undergoes an independent Bernoulli trial to decide treatment or control assignment. This simple design reduces variance but is unsuitable for extremely small samples (e.g., a single city with a 14‑day experiment) because of potential imbalance.

4.1.2 Grouping Mechanism

Each unit‑time slice is assigned to treatment with a fixed probability. The mechanism can be expressed as:

Here, the symbol denotes whether the unit falls in the treatment group (1) or control group (0). The propensity score represents the probability of receiving treatment given covariates.

The sample sizes of treatment and control groups are denoted, and the grouping expression used by the platform (MurmurHash3) is shown below:

Control Group Expression: (murmur332(murmur332(aoi_id, seedA)+murmur332(dt, seedB), seedC)%2) in (0) Treatment Group Expression: (murmur332(murmur332(aoi_id, seedA)+murmur332(dt, seedB), seedC)%2) in (1)

4.1.3 Evaluation Principle

The coin‑flip random rotation shares the same evaluation methods as ordinary randomized experiments, including the CUPED variance‑reduction technique. Care must be taken when the experimental unit differs from the analysis unit to avoid under‑estimating variance and producing false positives.

4.2 Complete Random Rotation

4.2.1 Overview

When strong spatial spillover exists and hourly rotation introduces carryover effects, a city‑level daily random rotation can be used. However, limited experiment duration may lead to imbalance in the number of treatment days. Complete random rotation allows pre‑specifying the exact number of treatment days (e.g., 7 out of 14) and can be combined with stratification (e.g., by weekday/weekend or by city).

4.2.2 Grouping Mechanism

The mechanism assigns exactly k out of N unit‑time slices to treatment, the rest to control. Mathematically:

Each slice’s assignment is equally likely among all combinations.

Stratified complete random rotation first partitions units (e.g., by city or weekday) and then applies complete random rotation within each stratum.

4.2.3 Evaluation Principle

For small sample sizes, non‑parametric Fisher exact tests and Neyman variance estimation are recommended. A detailed calculation table is provided in the original document.

4.2.4 Stratified Random Rotation

When multiple independent regions or cities are involved, stratification can be applied. Units are divided into strata based on predefined variables (e.g., city, day of week), and within each stratum a complete random rotation is performed.

4.3 Paired Random Rotation

4.3.1 Overview

When city‑level daily rotation leads to large day‑to‑day differences, a half‑city paired rotation can be used. The city is split into two similar halves; each day one half is assigned to treatment and the other to control, reducing day‑level variance while tolerating a small spatial spillover at the boundary.

4.3.2 Grouping Mechanism

Pairs of experimental units are formed based on key features. Within each pair, one unit is randomly assigned to treatment and the other to control. The mathematical representation is:

4.3.3 Evaluation Principle

Paired random rotation uses the same evaluation methods as paired random experiments: Fisher exact test for p‑values and Neyman variance estimation.

4.4 Extensions and Outlook

4.4.1 Handling Abnormal Scenarios

When unexpected external disturbances occur during a daily rotation experiment, two main approaches are suggested:

Method 1: Outlier Removal – Identify and discard abnormal days using statistical tests (e.g., 3‑sigma or IQR) based on the past 45 days of data.

Method 2: Covariate Analysis + CRSE – Use regression with covariates (e.g., environmental disturbance level) and cluster‑robust standard errors (CRSE) to account for intra‑city correlation.

4.4.2 Carryover Effects at Hourly Granularity

Finer time‑slice granularity increases sample size but may introduce carryover effects, where the strategy in one slice influences the next. Examples include traffic‑signal optimization where a short‑green‑light period creates queue spillover into the following slice. Three mitigation strategies are discussed: model‑based adjustment, wash‑out periods, and time‑series modeling.

4.4.3 Other Rotation Designs

Alternating rotation experiments assign treatment and control in alternating time slices (e.g., treatment, control, treatment). This design fits scenarios with strong periodicity but requires careful modeling assumptions and sufficient sample size.

Industry examples include DoorDash’s daily alternating rotation evaluated via bootstrap‑based t‑tests and a domestic company’s hourly alternating rotation using Varying Coefficient Models (VCM) or Varying Coefficient Decision Processes (VCDP).

Note that alternating designs are less random and may lead to variance‑estimation errors if the deterministic assignment mechanism is ignored.

experimental design carryover effect random rotation spillover effect switchback experiment

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.