Industry Insights 25 min read

From Simple Split Tests to Real‑Time Multi‑Layer Experiments: The Evolution of an AB Testing Platform

This article traces the step‑by‑step evolution of an AB testing platform—from its initial 1.0 version with basic traffic splitting, through the 2.0 era that introduced multi‑layer orthogonal traffic models and real‑time metric pipelines, to the 3.0 era focused on usability, stability, and advanced analysis—while sharing concrete design decisions, implementation details, and lessons learned.

DeWu Technology
DeWu Technology
DeWu Technology
From Simple Split Tests to Real‑Time Multi‑Layer Experiments: The Evolution of an AB Testing Platform

Introduction

AB testing is a standard method for evaluating product and algorithm changes in large‑scale internet services. The platform described evolved from a minimal split‑test system (1.0) to a full‑featured framework (3.0) that supports hierarchical traffic allocation, real‑time metric monitoring, and advanced experiment management.

1.0 – Basic AB Experiment

Core elements

Goal & hypothesis – Define a target metric (e.g., payment rate +5%) and a causal hypothesis (e.g., button color influences payment intent).

Experiment subject – User, request, or any exposure point.

Control and treatment groups – At least one control (no change) and one or more treatment groups; both must satisfy statistical power before launch.

Key functions

Configuration‑center driven traffic split; offline simulation of split rules for metric reporting.

Engineering pipeline that lets algorithms autonomously control traffic and strategy via configuration.

Strategy activation flow

Define experiment parameters in the central configuration service.

Online services pull the config, apply the new split from the next traffic batch, and log experiment events.

Nightly batch jobs copy the config to ODPS, recompute user‑level split assignments for the previous day, and generate reports.

2.0 – Comprehensive Support for Complex Business Needs

New business demands

Higher traffic pressure – multiple concurrent experiments require an orthogonal, layered traffic model.

Real‑time, accurate reporting – replace next‑day batch reports with Flink‑based streaming pipelines.

Complex experiment forms – joint experiments, multi‑cluster experiments, and user‑specific experiments.

Multi‑Layer Orthogonal Traffic Model

The model partitions traffic into nested layers and domains . A user traverses from outer to inner layers, hitting a bucket at each level; each bucket may contain its own split configuration, enabling fine‑grained control (user‑level, request‑level, device‑level, region‑level, etc.).

Characteristics

Unlimited hierarchical design via layer‑domain nesting.

Each layer uses a hash template plus traffic slots; slots define the proportion allocated to experiments.

Supports white‑list overrides, conditional layers (e.g., new users only), and custom split rules.

Since deployment the model has served >300 algorithmic scenarios.

Standard Experiment Engineering Chain (2.0)

Backend emits experiment logs directly (no offline config lookup).

Logs are sent to the client for real‑time exposure and click tracking.

Client behavior is reported instantly.

Flink processes the logs in real time, producing metric streams.

Real‑time metrics provide second‑level feedback for trend observation; final decisions still rely on statistically robust offline analysis.

ACM Common Logging Standard

To propagate experiment identifiers to the client, a unified field acm is defined. The format is:

version.businessDomain.resourceType.position.experiment.customValue

Fields:

version – ACM schema version.

businessDomain – Short system alias (e.g., srh for search).

resourceType – Content type or resource ID (e.g., spu_1009).

position – Ad or ranking slot.

experiment – AB experiment identifier; multiple experiments are hyphen‑separated.

customValue – Optional extension (e.g., channel_hot-position_2); must not contain ., -, or _.

Example strings:

acm:1.srh.spu_1009.sh.kka3b.10089-1929-100.channel_hot-position_2
{
  "code": 200,
  "data": {
    "total": 3730,
    "hits": 10,
    "searchId": "161113175619737242413163",
    "items": [{
      "spuId": "xxx",
      "acm": "1.ms.prd-10092.v1ss.exp-1.kka.12"
    }]
  },
  "requestId": "f2ca7c08693acd54",
  "time": 1611131759
}

Two feedback paths are used:

ACM field returned in backend responses → client reports exposure/click events → Flink computes real‑time metrics.

Backend logs containing full experiment metadata → offline or streaming pipelines compute stable, high‑precision reports.

3.0 – Optimized User Experience and Operational Efficiency

Usability improvements

Unified experiment management UI with one‑click publish, rollback, and parameter editing.

Parameter tooling: traffic‑color analysis visualizes where a parameter is configured and its effective traffic share.

Parameter comparison view shows differing values of the same parameter across experiments.

Tree‑style layout visualization separates “layer view” (shows buckets and sub‑layers) from “bucket view” (shows nested layers inside a bucket).

Stability enhancements

Dynamic whitelist decoupled from experiment config; updates take effect instantly without a full experiment redeploy.

Concurrency control for Ark configuration pushes – a multi‑step rollout prevents simultaneous push failures.

Experiment effect analysis

Standardized experiment duration, core metrics, and auxiliary metrics enable automated report generation.

Guidelines for statistical power, p‑value, confidence level, and Simpson’s paradox (aggregated data may reverse daily trends).

Future Directions

Planned work includes:

Integration with the data‑warehouse team’s universal metric visualization framework for richer traffic‑allocation and user‑segment analysis.

Cross‑environment operation (dev, test, prod) from a single UI.

Side‑card based split mechanisms to improve stability and iteration speed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Real-time analyticsSoftware EngineeringA/B testingExperiment Platformtraffic modeling
DeWu Technology
Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.