How Vivo Built an Intelligent Gray‑Release Data System for Faster, Scientific Game Updates
This article details Vivo Game Center's end‑to‑end intelligent gray‑release data framework—covering experiment design, statistical methods, data models, and automated product solutions—to ensure scientific version evaluation, accelerate project timelines, and quickly close the gray‑testing loop.
Introduction
Game platforms handle massive user bases, long business chains, and complex data logic. Frequent version releases require small‑scale gray testing before full rollout. Since 2021, Vivo has performed gray releases every 1–2 weeks, often running several versions concurrently. The data‑layer challenges are:
Ensuring scientific evaluation of gray releases.
Improving data production efficiency to protect project schedules.
Rapidly locating and closing root causes when abnormal metrics appear.
Gray Release Definition and Evolution
What is a Gray Release?
A gray release pushes a new version to a selected subset of users, collects usability, performance, and issue feedback, and rolls back if serious problems are found. Otherwise the rollout is gradually expanded.
Evolution Stages
Stage 1: Same time window but different user groups, causing sample bias.
Stage 2: Same user group but different time, vulnerable to temporal effects.
Stage 3: Identical time and user group, providing consistent sample attributes, optimal sample‑size calculation, and fast upgrade via silent installation.
System Architecture
The intelligent gray‑release data system consists of two parts:
Pre‑release traffic strategy : sample‑size calculation and gray‑period control.
Post‑release validation : core metric comparison, product‑level metric changes, and new‑feature performance assessment. Root‑cause analysis is added to improve interpretability.
Vivo implemented three iterative versions of the system, delivering dashboards for metric inspection, dimension drill‑down, user‑attribute validation, anomaly diagnosis, and automated gray‑release conclusion reports.
Methodology
Gray Experiments
Gray experiments comprise sampling and effect verification , embodying hypothesis testing and historical sample‑difference validation.
Hypothesis Testing
A hypothesis about a population parameter is formulated and tested using sample data to accept or reject the hypothesis.
Sample Historical Difference Validation
Even with hash‑based sampling, a 7‑day sliding window is used to verify and eliminate historical sample differences that could cause metric fluctuations.
Root‑Cause Analysis
When metric anomalies appear, the system identifies underlying causes across multiple dimensions using two techniques:
Metric Logic Analysis : Decompose rate or mean metrics into numerator and denominator factors, then further into dimension‑level contributions.
Adtributor Algorithm : A multi‑dimensional time‑series anomaly root‑cause method (Microsoft Research, 2014) applied with cross‑validation for reliability.
Intelligent Gray‑Release Solution
Overall Framework
The process is divided into pre‑gray, in‑gray, and post‑gray stages, forming a productized workflow.
Sample Size Estimation
The dashboard estimates the minimum sample size needed to detect a predefined effect under multiple confidence levels (default 95% confidence, 80% power) using recent metric performance. Features include:
Multiple standards for flexible effect‑size adjustments.
Automatic selection of the most recent full‑release data as input.
Separate calculation logic for mean‑type and rate‑type metrics.
Significance Testing
Statistical models determine whether metric changes between gray and control versions are statistically significant. The implementation supports three confidence levels for 20 business metrics.
# Input variables
variation_visitors # gray version denominator
control_visitors # control version denominator
variation_p # gray version metric value
control_p # control version metric value
z # z‑value for confidence level (90/95/99)
# Standard deviation
variation_se = math.sqrt(variation_p * (1 - variation_p))
control_se = math.sqrt(control_p * (1 - control_p))
# Gap and rate
gap = variation_p - control_p
rate = variation_p / control_p - 1
# Confidence interval
gap_interval_sdown = gap - z * math.sqrt(control_se**2 / control_visitors + variation_se**2 / variation_visitors)
gap_interval_sup = gap + z * math.sqrt(control_se**2 / control_visitors + variation_se**2 / variation_visitors)
confidence_interval_sdown = gap_interval_sdown / control_p
confidence_interval_sup = gap_interval_sup / control_p
# Significance decision
if (confidence_interval_sdown > 0 and confidence_interval_sup > 0) or (confidence_interval_sdown < 0 and confidence_interval_sup < 0):
print("Significant")
elif (confidence_interval_sdown > 0 and confidence_interval_sup < 0) or (confidence_interval_sdown < 0 and confidence_interval_sup > 0):
print("Not Significant")Automated Negative‑Metric Root‑Cause Analysis
The pipeline performs anomaly detection, historical sample validation, metric logic decomposition, and Adtributor analysis, automatically outputting the most influential dimension for each abnormal metric.
Intelligent Report Generation
Version information (version number, install count, days since release) is fetched automatically from the release platform. Based on metric signs (all positive, mixed, all negative) and sample uniformity, the system selects from over ten predefined conclusion templates to compose the final gray‑release report.
Future Work
Adopt stratified sampling to reduce inherent sample bias.
Enhance the multi‑dimensional root‑cause model by incorporating qualitative factors.
Explore sequential testing methods such as mSPRT to lessen reliance on pre‑estimated minimum sample sizes.
References
茆诗松, 王静龙, 濮晓龙. 《高等数理统计(第二版)》
是老李没错了. 《五分钟掌握AB实验和样本量计算原理》. CSDN博客
Ranjita Bhagwan, Rahul Kumar, Ramachandran Ramjee, et al. "Adtributor: Revenue Debugging in Advertising Systems"
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
