Big Data 15 min read

Design and Practice of Yanxuan A/B Scientific Experiment Platform

The article presents the design, scientific methodology, system architecture, and case studies of Yanxuan's A/B testing platform, detailing how statistical principles, automated tracking, traffic allocation models, and unified reporting accelerate decision‑making and reduce development effort in e‑commerce experiments.

DataFunTalk

Nov 30, 2022

Design and Practice of Yanxuan A/B Scientific Experiment Platform

01 Project Background and Pain Points

In e‑commerce many decisions—such as removing homepage modules, changing recommendation algorithms, adding activity prompts in the payment flow, or issuing coupons—affect conversion and revenue, yet they are often made by intuition, leading to uncertain outcomes. These scenarios typically require A/B experiments to support decisions.

Should the homepage remove certain modules and what impact will it have?

What effect does a parameter change in the recommendation model produce?

Can activity prompts in the payment flow increase transactions without hurting margin?

How large is the impact of coupons or red packets on users?

Relying on guesswork introduces unpredictable positive or negative optimizations; A/B testing provides a systematic way to evaluate these decisions.

AB Experiment Six Stages : experiment design, random traffic allocation, event tagging, online execution, data collection & statistics, and analyst reporting.

Challenges in the Six Stages

Ensuring random traffic allocation across independently developed services.

Preventing duplicate or missing event tagging that leads to errors.

Determining appropriate experiment duration.

Maintaining consistent statistical definitions and scientific evaluation.

Yanxuan A/B Platform Solutions

Provide a minimum sample size estimator before the experiment.

Centralize traffic control with a unified random allocation algorithm and automated tagging.

Automate report generation and offer scientific analysis to validate conclusions.

02 Scientific Design of Experiments

The scientific design focuses on three key questions: required sample size and duration, whether the test variant truly outperforms the control, and the magnitude of improvement. Statistical concepts such as minimum sample size, significance, power, and confidence intervals are applied.

1. Minimum Sample Size Estimation

Using Z‑distribution formulas with predefined Type‑I error (α) and Type‑II error (β), the platform calculates the smallest sample needed for a given expected lift. For example, detecting a lift from 1% to 2% requires about 2,000 samples, while detecting a lift from 1.1% to 1.2% needs roughly 170,000 samples.

2. Statistical Significance of Differences

Hypothesis testing is used: the null hypothesis assumes no difference between variants. A Type‑I error (α) occurs when the null is true but rejected; a Type‑II error (β) occurs when the null is false but not rejected. The platform adopts α ≤ 0.05 and power ≥ 80% (β ≤ 0.2) as thresholds for significance.

3. Effect Size Significance

After confirming significance, the platform reports the 95% confidence interval of the effect size, providing a range that quantifies the practical impact of the variant.

4. Random Traffic Allocation Model

To guarantee randomization, the platform adopts Google’s traffic model, enabling orthogonal experiments across layers and mutually exclusive traffic within the same layer, thus avoiding interference.

Criteria for orthogonal vs. exclusive experiments include differing direct metrics, no shared downstream impact (orthogonal), or overlapping experiment paths (exclusive), which can be enforced by time or traffic partitioning.

03 System Design

After a solution is developed, the platform enables rapid end‑to‑end workflow: configure experiment scheme, select key metrics, control traffic, launch the test, and view automated reports.

Key automation challenges addressed:

Business teams need to add experiment tags to code.

Data engineers must write cleaning and aggregation jobs, which involve metric definitions and reuse.

Report generation must be automated.

1. Tagging Automation

The existing client‑side auto‑tagging system is extended to automatically attach experiment identifiers to all events on pages involved in an experiment, including cross‑page propagation.

2. Unified Metrics and Automated Reporting

Unified automatic tags define core business metrics, which are registered on the platform. Data engineers implement real‑time and offline metric pipelines, enabling rapid detection of negative impacts or validation of expected gains.

With tagging and metric automation, reports are generated automatically, presenting scientific analysis and conclusions directly on the platform.

3. System Architecture

The platform also exposes a generic statistical engine for other business scenarios.

04 Case Studies

Case 1: Fast Validation of Homepage Product Image Changes

Operations uploaded two sets of product images and launched an experiment to measure conversion impact. The platform quickly determined the effect of visual changes.

Case 2: Rapid Verification of a Major Homepage Redesign

The new dynamic client‑side configuration allowed business to reorder modules and toggle features without code changes. Four experiment versions were validated within two weeks.

Since launch, the platform has supported thousands of experiments, achieving over 40% effective decision rate, reducing average experiment cycles to 3‑7 days, and cutting development effort by 1‑2 weeks per experiment.

Overall, the Yanxuan scientific experiment platform demonstrates how statistical rigor, automated tagging, unified metrics, and traffic control can accelerate data‑driven decision making in large‑scale e‑commerce.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data pipeline automation A/B testing traffic allocation statistical analysis experiment design

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.