How Cloud Music Turned 60k Tables into Valuable Data Assets
This article details Cloud Music's year‑long data assetization journey, covering the background, practical achievements, governance methods, and future roadmap for turning massive data warehouses into high‑value, well‑governed assets that drive cost reduction and business insight.
This article introduces the Cloud Music data assetization project, summarizing its background, recent practice results, and future plans.
1. Typical Problems
Common questions include: Do ready‑made tables exist for counting? How to obtain detailed lists for specific report metrics? Which of many tables holds the correct data? How well is the data warehouse built? What is its progress, model reusability, and extensibility? How to evaluate data quality, completeness, consistency, accuracy, timeliness, and quantification? How many tables have been built, who uses them, and what value they provide?
2. Initial External and Internal Environment
2.1 External Environment
Amid industry-wide cost‑reduction and efficiency‑increase pressures, the company also needs to act, and data assetization aligns with that mission, guiding end‑to‑end data construction.
2.2 Internal Situation
When joining Cloud Music, the data warehouse already had over eight years of accumulation: more than 60,000 tables, 70+ databases, 10+ business lines, over 100 PB of storage, hundreds of staff, more than 100,000 online/offline compute tasks, and annual big‑data costs exceeding 150 million RMB, making it industry‑leading in complexity and cost.
In recent years, continuous demand from business, commercial, technical, and functional departments, chronic talent shortage, and insufficient infrastructure have created a vicious cycle of worsening problems.
3. My Thoughts and Actions
3.1 Start from Data Consumption
Adopt a “build while governing” approach, akin to changing an aircraft engine mid‑flight, focusing on ROI to achieve quick, visible results; the consumption side offers three reasons: direct perception of data‑asset changes, prohibitive cost and risk of bottom‑up overhaul, and hidden value in existing “treasure” assets.
A key issue is that many valuable tables are built but consumers often feel data is insufficient—often because they cannot find it.
We addressed this with three actions:
Streamline data models: identify core tables per business, eliminate unused, obsolete, or over‑engineered tables.
Reshape information structure: reorganize core table listings from a consumption perspective, produce a data‑asset whitepaper, and keep it updated.
Productize operations: build a data‑asset portal linking production and consumption, with usage tracking.
Initially we drafted the whitepaper in Lingxi Docs, created a simple portal, and added tracking.
Collaboration with NetEase DataFane led to a data map and data album, helping business units organize assets by consumption scenarios.
Thus the data warehouse team now has a product platform to host core assets and build authority in consumers' minds.
3.2 Data Production Governance
Unlike the lightweight consumption side, production governance requires detailed, incremental work. We focus on “establish standards, build tools” to implement governance.
We introduced three quality metrics—high quality, strong standards, low cost—to quantify data‑warehouse construction.
Due to historical reasons, many standards were not fully applied, requiring manual effort; different stages emphasize different metrics.
After a year, the team agreed on a “three‑degree” metric system as a north‑star.
We set principles to ensure sustainable governance: evidence‑based, clear responsibilities, sustainable mechanisms, recoverable outcomes, and methodical knowledge capture.
Partnering with NetEase DataFane gave us metadata lineage for production, enabling modeling and governance tools, visual dashboards, and monitoring.
4. Achievements
One picture summarizes the results, showing notable improvements in absolute numbers, growth trends, output stability, and daily awareness among developers.
5. Long‑Term Vision for the Data System
The data system includes upstream production, mid‑stage data platform, downstream insight and reporting, and intelligent services; the data middle‑platform is the core linking all layers.
After the first phase of assetization, we need to reassess application‑side efficiency, aiming to reduce downstream complexity; ongoing work is illustrated.
6. Phase‑wise Practice Summary
A diagram recaps the past year’s production practice, showing delivered outcomes, methodologies, and tools that reinforce infrastructure, reduce cost, increase efficiency, and explore data‑driven business opportunities.
7. Looking Ahead
Data assetization is just the starting point; together with partner teams we will continue to expand the data business landscape.
Our mission and vision: with a data‑asset and data‑service mindset, continuously advance data‑mid‑platform construction, delivering a unified, reliable, convenient, and secure data‑asset management and service platform.
Data Thinking Notes
Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.