Design and Practice of Yanxuan A/B Scientific Experiment Platform
The article presents the design, scientific methodology, system architecture, and case studies of Yanxuan's A/B testing platform, detailing how statistical principles, automated tracking, traffic allocation models, and unified reporting accelerate decision‑making and reduce development effort in e‑commerce experiments.
01 Project Background and Pain Points
In e‑commerce many decisions—such as removing homepage modules, changing recommendation algorithms, adding activity prompts in the payment flow, or issuing coupons—affect conversion and revenue, yet they are often made by intuition, leading to uncertain outcomes. These scenarios typically require A/B experiments to support decisions.
Should the homepage remove certain modules and what impact will it have?
What effect does a parameter change in the recommendation model produce?
Can activity prompts in the payment flow increase transactions without hurting margin?
How large is the impact of coupons or red packets on users?
Relying on guesswork introduces unpredictable positive or negative optimizations; A/B testing provides a systematic way to evaluate these decisions.
AB Experiment Six Stages : experiment design, random traffic allocation, event tagging, online execution, data collection & statistics, and analyst reporting.
Challenges in the Six Stages
Ensuring random traffic allocation across independently developed services.
Preventing duplicate or missing event tagging that leads to errors.
Determining appropriate experiment duration.
Maintaining consistent statistical definitions and scientific evaluation.
Yanxuan A/B Platform Solutions
Provide a minimum sample size estimator before the experiment.
Centralize traffic control with a unified random allocation algorithm and automated tagging.
Automate report generation and offer scientific analysis to validate conclusions.
02 Scientific Design of Experiments
The scientific design focuses on three key questions: required sample size and duration, whether the test variant truly outperforms the control, and the magnitude of improvement. Statistical concepts such as minimum sample size, significance, power, and confidence intervals are applied.
1. Minimum Sample Size Estimation
Using Z‑distribution formulas with predefined Type‑I error (α) and Type‑II error (β), the platform calculates the smallest sample needed for a given expected lift. For example, detecting a lift from 1% to 2% requires about 2,000 samples, while detecting a lift from 1.1% to 1.2% needs roughly 170,000 samples.
2. Statistical Significance of Differences
Hypothesis testing is used: the null hypothesis assumes no difference between variants. A Type‑I error (α) occurs when the null is true but rejected; a Type‑II error (β) occurs when the null is false but not rejected. The platform adopts α ≤ 0.05 and power ≥ 80% (β ≤ 0.2) as thresholds for significance.
3. Effect Size Significance
After confirming significance, the platform reports the 95% confidence interval of the effect size, providing a range that quantifies the practical impact of the variant.
4. Random Traffic Allocation Model
To guarantee randomization, the platform adopts Google’s traffic model, enabling orthogonal experiments across layers and mutually exclusive traffic within the same layer, thus avoiding interference.
Criteria for orthogonal vs. exclusive experiments include differing direct metrics, no shared downstream impact (orthogonal), or overlapping experiment paths (exclusive), which can be enforced by time or traffic partitioning.
03 System Design
After a solution is developed, the platform enables rapid end‑to‑end workflow: configure experiment scheme, select key metrics, control traffic, launch the test, and view automated reports.
Key automation challenges addressed:
Business teams need to add experiment tags to code.
Data engineers must write cleaning and aggregation jobs, which involve metric definitions and reuse.
Report generation must be automated.
1. Tagging Automation
The existing client‑side auto‑tagging system is extended to automatically attach experiment identifiers to all events on pages involved in an experiment, including cross‑page propagation.
2. Unified Metrics and Automated Reporting
Unified automatic tags define core business metrics, which are registered on the platform. Data engineers implement real‑time and offline metric pipelines, enabling rapid detection of negative impacts or validation of expected gains.
With tagging and metric automation, reports are generated automatically, presenting scientific analysis and conclusions directly on the platform.
3. System Architecture
The platform also exposes a generic statistical engine for other business scenarios.
04 Case Studies
Case 1: Fast Validation of Homepage Product Image Changes
Operations uploaded two sets of product images and launched an experiment to measure conversion impact. The platform quickly determined the effect of visual changes.
Case 2: Rapid Verification of a Major Homepage Redesign
The new dynamic client‑side configuration allowed business to reorder modules and toggle features without code changes. Four experiment versions were validated within two weeks.
Since launch, the platform has supported thousands of experiments, achieving over 40% effective decision rate, reducing average experiment cycles to 3‑7 days, and cutting development effort by 1‑2 weeks per experiment.
Overall, the Yanxuan scientific experiment platform demonstrates how statistical rigor, automated tagging, unified metrics, and traffic control can accelerate data‑driven decision making in large‑scale e‑commerce.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.