Big Data 21 min read

Data Governance Practices and Experiences at NetEase Cloud Music

This article details NetEase Cloud Music's comprehensive data governance journey, covering data warehouse architecture, data standards, event tracking (埋点) governance, asset lifecycle management, and future automation plans, illustrating how systematic governance improves data quality, cost efficiency, and business insight.

DataFunSummit
DataFunSummit
DataFunSummit
Data Governance Practices and Experiences at NetEase Cloud Music

In the era of big data, NetEase Cloud Music recognized the value of data assets and began exploring application scenarios, business models, and building supporting technical platforms, making data governance a crucial tool for unlocking data value.

1. Music Data Warehouse Overview

The data warehouse faces challenges such as massive data volume, complex business scenarios, and historical baggage, leading to difficulties in ensuring data quality, controlling computation, and managing costs.

2. Data Standards

Data standards are foundational for governance. Design standards address lack of top‑level design, data silos, and data quality issues, while development standards focus on eliminating file‑based reads/writes, enforcing pure SQL for public models, limiting workflow outputs to a single table, and establishing cross‑workflow dependencies for better lineage and operations.

Platform controls via the NetEase DataFan development platform enforce naming conventions, domain definitions, and table/field standards.

3. Event Tracking (埋点) Governance

Key problems include chaotic formats, low data quality, low development efficiency, and difficulty in data inspection. The solution introduces a standardized SDK that objectifies SPM/SCM, creates reusable objects, and enriches events with refer information, moving from flat JSON to nested JSON with global and event parameters.

Technical solutions include SDK implementation, data warehouse integration for unified offline and real‑time data, and attribution design covering channel, content, search, and strategy attribution.

Process management involves coordinated steps among planners, BI, data warehouse developers, front‑end developers, SDK, QA testing, and final deployment.

4. Asset Governance

Asset governance targets compute and storage resources, addressing high CPU/memory utilization, storage growth, and many unused tables. It includes data flow governance (layered model and single‑task flow), lifecycle governance (business scenario analysis, data lineage, cost analysis), and uses the DataFan asset center to monitor and clean up resources, having already decommissioned over 100 reports and saved more than 1 PB of storage.

5. Outlook

Future plans aim for automated data governance: visualizing data assets, implementing static SQL code checks with performance alerts, and establishing health scores for data and data workers to guide efficient production and analysis.

Q&A Highlights

Data governance has resulted in standardized data domains, restructured DWD/DWS/DIM layers, and ongoing SDK rollout.

Metrics such as storage growth, table lifecycle, and decommissioned tables are tracked via the platform.

The architecture emphasizes a DWS layer with horizontal and vertical splits for snapshots, first‑time events, and retention analysis.

Data and business are interdependent; data supports business decisions like traffic allocation and copyright ROI analysis.

Overall, NetEase Cloud Music's data governance initiatives have improved data quality, reduced costs, and enhanced analytical capabilities across the organization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataData WarehouseData Governance
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.