Data Governance Practices and Experiences at NetEase Cloud Music
This article details NetEase Cloud Music's comprehensive data governance journey, covering data warehouse architecture, data standards, event tracking (埋点) governance, asset lifecycle management, and future automation plans, illustrating how systematic governance improves data quality, cost efficiency, and business insight.
In the era of big data, NetEase Cloud Music recognized the value of data assets and began exploring application scenarios, business models, and building supporting technical platforms, making data governance a crucial tool for unlocking data value.
1. Music Data Warehouse Overview
The data warehouse faces challenges such as massive data volume, complex business scenarios, and historical baggage, leading to difficulties in ensuring data quality, controlling computation, and managing costs.
2. Data Standards
Data standards are foundational for governance. Design standards address lack of top‑level design, data silos, and data quality issues, while development standards focus on eliminating file‑based reads/writes, enforcing pure SQL for public models, limiting workflow outputs to a single table, and establishing cross‑workflow dependencies for better lineage and operations.
Platform controls via the NetEase DataFan development platform enforce naming conventions, domain definitions, and table/field standards.
3. Event Tracking (埋点) Governance
Key problems include chaotic formats, low data quality, low development efficiency, and difficulty in data inspection. The solution introduces a standardized SDK that objectifies SPM/SCM, creates reusable objects, and enriches events with refer information, moving from flat JSON to nested JSON with global and event parameters.
Technical solutions include SDK implementation, data warehouse integration for unified offline and real‑time data, and attribution design covering channel, content, search, and strategy attribution.
Process management involves coordinated steps among planners, BI, data warehouse developers, front‑end developers, SDK, QA testing, and final deployment.
4. Asset Governance
Asset governance targets compute and storage resources, addressing high CPU/memory utilization, storage growth, and many unused tables. It includes data flow governance (layered model and single‑task flow), lifecycle governance (business scenario analysis, data lineage, cost analysis), and uses the DataFan asset center to monitor and clean up resources, having already decommissioned over 100 reports and saved more than 1 PB of storage.
5. Outlook
Future plans aim for automated data governance: visualizing data assets, implementing static SQL code checks with performance alerts, and establishing health scores for data and data workers to guide efficient production and analysis.
Q&A Highlights
Data governance has resulted in standardized data domains, restructured DWD/DWS/DIM layers, and ongoing SDK rollout.
Metrics such as storage growth, table lifecycle, and decommissioned tables are tracked via the platform.
The architecture emphasizes a DWS layer with horizontal and vertical splits for snapshots, first‑time events, and retention analysis.
Data and business are interdependent; data supports business decisions like traffic allocation and copyright ROI analysis.
Overall, NetEase Cloud Music's data governance initiatives have improved data quality, reduced costs, and enhanced analytical capabilities across the organization.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
