Big Data 18 min read

Intelligent Gray Release Data System for Vivo Game Center: Methodology and Solutions

This article presents Vivo Game Center's end‑to‑end intelligent gray‑release data system, detailing its experimental mindset, statistical methods, data models, and product solutions that ensure scientific version evaluation, project progress, and rapid issue closure through root‑cause analysis and full‑process automation.

Architecture Digest
Architecture Digest
Architecture Digest
Intelligent Gray Release Data System for Vivo Game Center: Methodology and Solutions

1. Introduction

The game business has massive user scale, long business chains, and complex data logic. Vivo Game Center, as the core user product of the game platform, releases versions frequently and must conduct small‑scale gray verification before each launch. Since 2021, important versions undergo gray testing every 1–2 weeks, often with multiple versions running concurrently.

Gray testing at the data layer raises three key questions:

How to ensure the scientificity of version gray evaluation?

How to improve the efficiency of gray data production to guarantee project progress?

When a gray version shows abnormal metrics, how to quickly locate the root cause and close the loop?

Over the past two years, we have systematically applied gray evaluation methods to agile BI products, building a gray data system that now addresses these three problems.

2. Development of the Gray Data System

2.1 What is a gray release? A gray release means selecting a subset of users to experience a new version before full rollout, gathering feedback on usability and performance, and rolling back if serious issues are found.

2.2 Evolution stages of gray evaluation solutions

Three stages illustrate how control of variables improves scientific evaluation:

Stage 1: Same comparison time but different upgrade speeds cause non‑homogeneous samples.

Stage 2: Same user group but behavior may change over time, so time‑related differences remain.

Stage 3: Both time and user group are identical, bringing three advantages: identical sample attributes, reasonable sample size calculation, and fast silent installation for shorter gray cycles.

2.3 Content of the gray data system

The system typically involves two parts: pre‑gray traffic strategy (sample size calculation and gray duration control) and post‑gray data validation (core metric comparison, product‑level metric changes, and new‑feature performance).

Beyond conventional gray evaluation, we introduce root‑cause analysis to improve interpretability.

2.4 Vivo Game Center's practice

We built the "Game Center Intelligent Gray Data System" consisting of dashboards for metric inspection, dimension drill‑down, user‑attribute validation, and anomaly diagnosis, plus an automated gray‑report push. After deployment, the system achieved a closed loop of automated data production, effect verification, interpretation, and decision recommendation, greatly reducing manual effort.

3. Methodology in the Gray Data System

3.1 Gray experiments include sampling and effect verification, corresponding to hypothesis testing and sample‑history difference verification.

3.1.1 Hypothesis testing formulates a hypothesis about a population parameter and uses sample performance to accept or reject it.

3.1.2 Sample‑history difference verification validates that random sampling does not introduce bias by checking historical differences, typically using a 7‑day sliding window.

3.2 Root‑cause analysis addresses metric anomalies that are linked to multiple dimensions. We combine two methods:

Metric Logic Analysis : Decompose rate or mean metrics into numerator and denominator factors across dimensions, then logically break down the metric.

Adtributor algorithm : A multi‑dimensional time‑series anomaly root‑cause method from Microsoft Research (2014), adapted for our root‑cause stage.

These methods cross‑validate each other to ensure reliable analysis.

4. Intelligent Gray Solution

4.1 Overall framework

The gray process is divided into three phases—pre‑gray, in‑gray, and post‑gray—forming the productized framework shown in the diagram.

4.2 Process design

The workflow diagram (below) illustrates the end‑to‑end implementation based on the framework.

4.3 Core components

4.3.1 Sample size estimation

The dashboard provides multi‑confidence‑level (default 95% confidence, 80% power) sample size estimates based on recent metric performance and expected change magnitude.

Multiple standards allow flexible adjustment of expected effect.

Automatically selects the most recent full‑release data as input.

Mean and rate metrics use differentiated calculation logic.

4.3.2 Effect‑metric significance testing

The model answers whether the metric change between gray and control versions is statistically significant. Three confidence levels are supported for 20 business metrics.

Python implementation for rate‑type metrics:

variation_visitors  # denominator for gray version
control_visitors    # denominator for control version
variation_p         # metric value for gray version
control_p           # metric value for control version
z = ...             # z‑value for confidence level (90/95/99%) 

variation_se = math.sqrt(variation_p * (1 - variation_p))
control_se   = math.sqrt(control_p * (1 - control_p))

gap = variation_p - control_p
rate = variation_p / control_p - 1

gap_interval_sdown = gap - z * math.sqrt(control_se**2 / control_visitors + variation_se**2 / variation_visitors)

gap_interval_sup   = gap + z * math.sqrt(control_se**2 / control_visitors + variation_se**2 / variation_visitors)

confidence_interval_sdown = gap_interval_sdown / control_p
confidence_interval_sup   = gap_interval_sup / control_p

if (confidence_interval_sdown > 0 and confidence_interval_sup > 0) or (confidence_interval_sdown < 0 and confidence_interval_sup < 0):
    print("显著")
elif (confidence_interval_sdown > 0 and confidence_interval_sup < 0) or (confidence_interval_sdown < 0 and confidence_interval_sup > 0):
    print("不显著")

Python implementation for mean‑type metrics:

variation_visitors
control_visitors
variation_p
control_p
variation_x   # per‑user metric for gray version
control_x     # per‑user metric for control version
z = ...

variation_se = np.std(variation_x, ddof=1)
control_se   = np.std(control_x, ddof=1)

gap = variation_p - control_p
rate = variation_p / control_p - 1

gap_interval_sdown = gap - z * math.sqrt(control_se**2 / control_visitors + variation_se**2 / variation_visitors)

gap_interval_sup   = gap + z * math.sqrt(control_se**2 / control_visitors + variation_se**2 / variation_visitors)

confidence_interval_sdown = gap_interval_sdown / control_p
confidence_interval_sup   = gap_interval_sup / control_p

if (confidence_interval_sdown > 0 and confidence_interval_sup > 0) or (confidence_interval_sdown < 0 and confidence_interval_sup < 0):
    print("显著")
elif (confidence_interval_sdown > 0 and confidence_interval_sup < 0) or (confidence_interval_sdown < 0 and confidence_interval_sup > 0):
    print("不显著")

4.3.3 Negative‑metric automatic root‑cause analysis

The workflow includes anomaly detection, sample‑history verification, metric logic decomposition, and Adtributor automatic root‑cause analysis.

Adtributor identifies the dimension contributing most to the metric drift, and we map dimension hierarchies to generate automated business‑level root‑cause conclusions.

4.3.4 Intelligent gray‑report stitching and push

Version information (version number, install count, days since release) is automatically fetched from the release platform and placed at the report header.

Based on metric polarity (all positive, partially negative, all negative) and sample uniformity, the system selects from over ten pre‑defined conclusion templates.

5. Conclusion

To meet the scientific evaluation and rapid decision‑making needs of business gray releases, we provide a complete intelligent gray data system covering experimental thinking, mathematical methods, data models, and product solutions. The approach is intended as a reference for other businesses, though each should tailor the design to its own characteristics.

Future improvements include exploring stratified sampling to reduce sample imbalance, enhancing the multi‑dimensional root‑cause model with qualitative factors, and investigating sequential testing methods such as mSPRT to relax minimum sample‑size constraints.

References

茆诗松, 王静龙, 濮晓龙. 《高等数理统计(第二版)》

是老李没错了. 《五分钟掌握AB实验和样本量计算原理》. CSDN博客

Ranjita Bhagwan, Rahul Kumar, Ramachandran Ramjee, et al. "Adtributor: Revenue Debugging in Advertising Systems"

gray releasedata analysisA/B Testingproduct analyticsroot cause analysissample size estimation
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.