Fundamentals 16 min read

How Vivo Built an Intelligent Gray‑Release Data System for Faster, Scientific Game Updates

This article details Vivo Game Center's end‑to‑end intelligent gray‑release data framework—covering experiment design, statistical methods, data models, and automated product solutions—to ensure scientific version evaluation, accelerate project timelines, and quickly close the gray‑testing loop.

ITPUB

Jul 2, 2022

How Vivo Built an Intelligent Gray‑Release Data System for Faster, Scientific Game Updates

Introduction

Game platforms handle massive user bases, long business chains, and complex data logic. Frequent version releases require small‑scale gray testing before full rollout. Since 2021, Vivo has performed gray releases every 1–2 weeks, often running several versions concurrently. The data‑layer challenges are:

Ensuring scientific evaluation of gray releases.

Improving data production efficiency to protect project schedules.

Rapidly locating and closing root causes when abnormal metrics appear.

Gray Release Definition and Evolution

What is a Gray Release?

A gray release pushes a new version to a selected subset of users, collects usability, performance, and issue feedback, and rolls back if serious problems are found. Otherwise the rollout is gradually expanded.

Evolution Stages

Stage 1: Same time window but different user groups, causing sample bias.

Stage 2: Same user group but different time, vulnerable to temporal effects.

Stage 3: Identical time and user group, providing consistent sample attributes, optimal sample‑size calculation, and fast upgrade via silent installation.

System Architecture

The intelligent gray‑release data system consists of two parts:

Pre‑release traffic strategy : sample‑size calculation and gray‑period control.

Post‑release validation : core metric comparison, product‑level metric changes, and new‑feature performance assessment. Root‑cause analysis is added to improve interpretability.

Vivo implemented three iterative versions of the system, delivering dashboards for metric inspection, dimension drill‑down, user‑attribute validation, anomaly diagnosis, and automated gray‑release conclusion reports.

Methodology

Gray Experiments

Gray experiments comprise sampling and effect verification , embodying hypothesis testing and historical sample‑difference validation.

Hypothesis Testing

A hypothesis about a population parameter is formulated and tested using sample data to accept or reject the hypothesis.

Sample Historical Difference Validation

Even with hash‑based sampling, a 7‑day sliding window is used to verify and eliminate historical sample differences that could cause metric fluctuations.

Root‑Cause Analysis

When metric anomalies appear, the system identifies underlying causes across multiple dimensions using two techniques:

Metric Logic Analysis : Decompose rate or mean metrics into numerator and denominator factors, then further into dimension‑level contributions.

Adtributor Algorithm : A multi‑dimensional time‑series anomaly root‑cause method (Microsoft Research, 2014) applied with cross‑validation for reliability.

Intelligent Gray‑Release Solution

Overall Framework

The process is divided into pre‑gray, in‑gray, and post‑gray stages, forming a productized workflow.

Sample Size Estimation

The dashboard estimates the minimum sample size needed to detect a predefined effect under multiple confidence levels (default 95% confidence, 80% power) using recent metric performance. Features include:

Multiple standards for flexible effect‑size adjustments.

Automatic selection of the most recent full‑release data as input.

Separate calculation logic for mean‑type and rate‑type metrics.

Significance Testing

Statistical models determine whether metric changes between gray and control versions are statistically significant. The implementation supports three confidence levels for 20 business metrics.

# Input variables
variation_visitors  # gray version denominator
control_visitors    # control version denominator
variation_p         # gray version metric value
control_p           # control version metric value
z                   # z‑value for confidence level (90/95/99)

# Standard deviation
variation_se = math.sqrt(variation_p * (1 - variation_p))
control_se   = math.sqrt(control_p * (1 - control_p))

# Gap and rate
gap  = variation_p - control_p
rate = variation_p / control_p - 1

# Confidence interval
gap_interval_sdown = gap - z * math.sqrt(control_se**2 / control_visitors + variation_se**2 / variation_visitors)
gap_interval_sup   = gap + z * math.sqrt(control_se**2 / control_visitors + variation_se**2 / variation_visitors)
confidence_interval_sdown = gap_interval_sdown / control_p
confidence_interval_sup   = gap_interval_sup / control_p

# Significance decision
if (confidence_interval_sdown > 0 and confidence_interval_sup > 0) or (confidence_interval_sdown < 0 and confidence_interval_sup < 0):
    print("Significant")
elif (confidence_interval_sdown > 0 and confidence_interval_sup < 0) or (confidence_interval_sdown < 0 and confidence_interval_sup > 0):
    print("Not Significant")

Automated Negative‑Metric Root‑Cause Analysis

The pipeline performs anomaly detection, historical sample validation, metric logic decomposition, and Adtributor analysis, automatically outputting the most influential dimension for each abnormal metric.

Intelligent Report Generation

Version information (version number, install count, days since release) is fetched automatically from the release platform. Based on metric signs (all positive, mixed, all negative) and sample uniformity, the system selects from over ten predefined conclusion templates to compose the final gray‑release report.

Future Work

Adopt stratified sampling to reduce inherent sample bias.

Enhance the multi‑dimensional root‑cause model by incorporating qualitative factors.

Explore sequential testing methods such as mSPRT to lessen reliance on pre‑estimated minimum sample sizes.

References

茆诗松, 王静龙, 濮晓龙. 《高等数理统计（第二版）》

是老李没错了. 《五分钟掌握AB实验和样本量计算原理》. CSDN博客

Ranjita Bhagwan, Rahul Kumar, Ramachandran Ramjee, et al. "Adtributor: Revenue Debugging in Advertising Systems"

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

gray release A/B testing Data Analytics Root Cause Analysis statistical significance sample size estimation product automation

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.