Product Management 14 min read

How to Calculate Minimum Sample Size for Reliable A/B Tests

This article explains common pain points in A/B testing, introduces essential statistical concepts such as sampling distribution, parameter estimation, confidence intervals, and hypothesis testing, and provides step‑by‑step formulas and a concrete example for calculating the minimum sample size needed to run a trustworthy experiment.

ByteDance Data Platform

Sep 7, 2022

How to Calculate Minimum Sample Size for Reliable A/B Tests

Preface

A/B experiments have forward‑looking, statistical, and scientific characteristics. When used correctly, they fully leverage data analysis in the big‑data era to solve problems and provide strong evidence for decision‑making, but users often encounter pain points and doubts.

Pain Points

How much traffic each experiment needs.

No clear idea of how long an experiment should run.

Solutions

Determine the required traffic to verify a specific feature.

Decide the appropriate experiment duration.

Statistical Basics

Research Object

Population X: a metric of interest.

Individual: an element xi in the population.

Sample: a subset of individuals Xi.

Statistical Tools

(1) Sample Mean – reflects the population mean.

(2) Sample Variance – average of squared deviations, reflects population variance.

Sample correction (image omitted for brevity).

(3) Sample Standard Deviation – the square root of variance.

(4) Sample K‑th Moment – see image.

(5) Sample K‑th Central Moment – see image.

Sampling Distribution

Detailed discussion is omitted; the concepts are used later in derivations.

Standard normal distribution N(0,1)

Chi‑square distribution

t‑distribution

F‑distribution

Parameter Estimation

Using sample statistics to estimate population parameters, e.g., sample mean estimates population mean, sample proportion estimates population proportion, sample variance estimates population variance.

(1) Point estimation vs. interval estimation

Point estimation directly uses the sample statistic as the estimate.

Interval estimation provides a range (confidence interval) for the population parameter.

(2) Confidence interval and confidence level

A confidence interval is the range constructed from the sample statistic that likely contains the true parameter. Example: with 100 samples, 95% of the constructed intervals contain the true value.

Hypothesis Testing Example

Rice yield: expected 310 kg/acre, sample of 10 plots shows 320 kg/acre. Assuming normal distribution N(μ,144), test at α=0.05 (Z₀.₀₅=1.645, Z₀.₀₂₅=1.96). Use Z‑test if variance known, t‑test otherwise.

A Simple Complete A/B Test Example

Background and Setup

Web app integrates Volcano Engine A/B testing SDK to report events.

Goal: improve registration conversion rate.

Current flow uses image captcha; new flow proposes SMS verification to reduce user friction.

Core metric: registration conversion rate.

Two versions: control (image captcha) and experiment (SMS code).

Traffic split: 50% total, evenly distributed (25% each version).

Result Analysis

After about two weeks, each version received 25% of users. The new version increased conversion by ~10% with a 95% confidence interval of [8%, 12%]. This indicates a high probability of a real uplift.

Decision

The product manager decides to roll out the SMS verification to all users, significantly boosting the registration conversion rate.

Detailed Sample Size Calculation

The minimum sample size per group is calculated by:

Where n is the sample size per group, α and β are type‑I and type‑II error probabilities (commonly 0.05 and 0.2), Z is the normal quantile, Δ is the expected difference between groups, and σ is the standard deviation.

Assuming equal variance, the formula simplifies to:

Example: registration rates e₁=50% and e₂=60%, power 0.8, α=0.05 → each group needs at least 385 samples.

If the two versions have unequal traffic weights, adjust the total sample N using:

Method 2 uses hypothesis testing power calculations. The power is:

And the required sample per version can be derived as:

Conclusion

For typical A/B scenarios, assuming equal population variance, the presented formulas allow practitioners to compute the minimum sample size needed to achieve a desired confidence level and statistical power, guiding traffic allocation and experiment duration.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

A/B testing hypothesis testing product experimentation statistical power sample size calculation

Written by

ByteDance Data Platform

The ByteDance Data Platform team empowers all ByteDance business lines by lowering data‑application barriers, aiming to build data‑driven intelligent enterprises, enable digital transformation across industries, and create greater social value. Internally it supports most ByteDance units; externally it delivers data‑intelligence products under the Volcano Engine brand to enterprise customers.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.