Building an Intelligent Data Governance Platform at NetEase Cloud Music: Architecture, Practices, and Future Plans
This article presents a comprehensive case study of NetEase Cloud Music’s metadata‑driven intelligent governance platform, detailing its scale, construction background, modular architecture, rule‑based automation, practical deployment, and future roadmap for sustainable data ecosystem management.
The article shares NetEase Cloud Music’s experience in building an intelligent data governance platform based on metadata, outlining four main parts: the scale and status of the music data platform, the background and goals of the governance platform, its construction and implementation, and future planning.
Platform scale and status : The music data platform, built on NetEase’s unified data platform (Shufen), serves over 800 users with 200+ daily UVs. It supports diverse users (frontend, backend, QA, operations) and provides capabilities such as JAR tasks, SQL, real‑time notebooks, offline data transfer, and a low‑code batch‑stream integration platform called FastX. The platform handles 7,000+ offline tasks and 1,500+ real‑time tasks, with over 2,000 compute nodes and logs reaching billions of entries per day.
Governance platform background and goals : Challenges include non‑professional users lacking data development knowledge, weak data awareness, poor code quality, resource waste, and heavy operational burden (average 100 tickets per month, 60% basic usage issues). The goal is to create an automated, intelligent governance platform that detects problems early, assigns responsibility, and maintains a healthy data ecosystem.
Construction and landing : The platform consists of four concepts—resource objects, metadata, rules, and governance. The metadata module gathers features via iterator and feature plugins (warehouse‑based or plugin‑based). The rule module scans resources, applies configurable if‑else or model‑based logic, and supports both scheduled and on‑demand execution. The governance module bridges rule results with third‑party systems, enabling actions such as one‑click or batch table decommission, transparent decision data, automatic effect collection, rollback, and feedback mechanisms. A typical workflow: collect metadata → apply rules → report results → user action → effect callback.
Practical example – Hive idle‑table governance : The process includes metadata collection (basic table info, lineage, business attributes), rule configuration (exclude tables < 60 days old or whitelisted, check usage via lineage, task dependencies, file access), and governance actions (one‑click/offline, batch operations, impact analysis). The platform also provides alerts, quality scores, red‑black lists, and permission restrictions to motivate users.
Future planning : Enhancements focus on richer, more accurate feature data; more user‑friendly rule configuration (visual drag‑and‑drop, low‑code); and expanding governance to other scenarios such as model, machine, service, middleware, Kafka topic, and other database tables, aiming for a unified, extensible governance solution.
Q&A : Discussed HDFS governance plugins (idle data, small files), data inflation handling, and idle‑table evaluation criteria.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.