How NetEase Gaming Cut Data Warehouse Costs by 85%: A Data Governance Case Study
This case study details how NetEase Interactive Entertainment’s data team tackled massive log‑management chaos, storage bloat, and high overseas costs by standardizing logs, sharing a real‑time ODS layer, automating lifecycle management, tiered storage, and peak‑shaving compute scheduling, ultimately saving millions of yuan.
Project Background
Our team provides a wide range of game analytics services for NetEase Interactive Entertainment, supporting over 300 games with BI dashboards, custom metrics, and various analysis topics. The offline data warehouse executes up to 4 million SQL queries per month.
Project Challenges
Rapid business growth led to more than 3 million monthly offline warehouse jobs and highlighted several problems:
Chaotic log management with inconsistent standards, making parsing and maintenance difficult.
Continuous operation creates long‑running jobs and an unclear data lifecycle.
Duplicate storage of raw logs and formatted tables, inflating costs.
Overseas market expansion caused storage costs abroad to far exceed domestic costs.
Project Implementation
We pursued optimization in two major areas: storage and compute.
Storage Optimizations
The ODS layer accounts for 75% of total storage, so we focused on pre‑storage, during‑storage, and post‑storage improvements.
Pre‑storage: Log Format Standards
We introduced three principles for log printing:
Standard : Follow a unified format to ensure downstream parsability and avoid wasted space.
On‑Demand : Emit only logs required by downstream business, minimizing transmission volume.
Compact : For frequently repeated fields, print full information only once per session and reference subsequent logs via session ID or dimension tables.
Using a log‑reporting module that maps abbreviated column names to full English names, we reduced log storage by roughly 10%‑20%.
During‑storage: Shared Warehouse Solution
Previously, each department maintained its own ETL‑derived ODS, causing duplicate raw logs. We built a shared real‑time ODS layer where streams are ingested, ETL‑processed, and stored as separate tables with minute‑level freshness. Departments now query the shared ODS directly, eliminating redundant raw‑log storage and improving query efficiency.
Post‑storage: Dynamic Lifecycle Management
Manual ODS table lifecycle configuration caused conflicts, maintenance overhead, and legacy issues. We automated expiration by monitoring SQL audit logs and automatically dropping long‑unread partitions, then recreating them on demand. Between June and September 2022, we reclaimed 2 229.5 TB of overseas data, reducing total storage to 386.3 TB—a 85% reduction.
Post‑storage: Tiered and Compressed Storage
Based on data age and access frequency, we introduced three storage tiers:
Tier 1: Hot Hadoop cluster for high‑performance reads/writes.
Tier 2: Cold Hadoop cluster with slightly lower performance.
Tier 3: Cold S3 cluster, read‑only with the lowest performance.
We also applied two levels of compression:
Level 1: ORC or TextFile with Snappy compression, achieving 30%‑48% size reduction.
Level 2: Erasure coding to reduce replication from 3× to 1.5×, saving about 50% of space.
Compute Optimizations
Pre‑compute: Efficiency Improvements
We prioritized long‑running, widely used P1 metrics. By redesigning data models to reduce cross‑layer references and reusing intermediate layers, we cut required compute resources for targeted jobs by 70%‑90% and lowered overall compute consumption for regular operational metrics by 30%‑50%.
During‑compute: Peak‑Shaving Scheduling
Previously, many jobs ran during busy hours (02:00‑08:00), where compute cost is four times higher than idle periods. By classifying tasks, moving non‑critical analytics to idle times, and adjusting priorities, we reduced overall compute cost.
Project Results
After the September 2021 optimization, storage savings reached up to 65% and compute savings up to 71%, estimating an annual cost reduction of about 9.6 million CNY. From June 2022 onward, focusing on overseas projects, storage and compute improvements contributed 45% and 55% respectively, with an estimated annual saving of over 15 million CNY.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data Thinking Notes
Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
