Pre‑Experiment User Stratification Model for Improving AB Test Uniformity in Vivo Game Center
The paper introduces a pre‑user stratification model that uses covariate‑balancing algorithms to create separate strata for distribution and revenue metrics, ensuring equal user allocation in Vivo game‑center AB tests, which reduces metric variance, improves gray‑release effectiveness, and saves significant investigation effort.
AB testing is a core method for validating product and version updates in the Vivo game center. However, user imbalance—differences in the distribution of user attributes between treatment and control groups—can severely bias the evaluation of experiment effects, leading to misleading business decisions.
The article first defines user imbalance, explains its causes (grouping methods, sample size, and metric characteristics such as sparsity and non‑normality), and illustrates how it manifests in both version‑gray‑scale experiments and strategy‑optimization experiments.
To address this, the authors propose a “pre‑user stratification” solution that leverages a stratified sampling (covariate‑balancing) algorithm developed by the Hawking experiment team. The approach builds separate stratification models for distribution‑related metrics and revenue‑related metrics, then draws equal numbers of users from each stratum into the treatment and control groups.
The conventional stratified sampling formula is presented, followed by detailed designs of the revenue‑stratification model (using intermediate variables to segment users for ARPU balance) and the distribution‑stratification model (using similar variables for download/activation metrics). Visual diagrams of both models are included.
Implementation required integration with the Hawking experiment platform and the version‑release system. The stratification logic was embedded into the platforms to ensure uniform user allocation during experiments and gray releases.
Two AA tests were conducted:
On the Hawking experiment platform, the stratified model preserved distribution metric stability while reducing revenue metric variance from 11.6% (hash‑based grouping) to 4.8%/1.9% and 3.3%/1.5% for two ARPU calculations.
On the version‑release system, the stratified model eliminated significant fluctuations in distribution metrics that were present with the previous phone‑ID‑tail‑number grouping.
After deployment, the model yielded measurable business benefits: gray‑release effectiveness increased by 9 percentage points, annual anomaly‑investigation effort saved ~35 person‑days, and positive strategy experiments contributed an estimated +0.2 % to yearly revenue.
The authors acknowledge remaining challenges—subjectivity in manual stratification and limited indicator coverage—and suggest future work incorporating more features and machine‑learning‑based stratification.
Overall, the pre‑user stratification model provides a practical, data‑driven method to improve AB test uniformity and reliability in large‑scale game analytics.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.