Building Kuaishou’s Scalable Metadata Management Platform for Big Data
This article details Kuaishou’s evolution of its metadata management platform—from early Hive‑centric beginnings to a unified 2.0 architecture and a forward‑looking 3.0 vision—highlighting challenges, key technologies, and how metadata drives data production, consumption, governance, and cost optimization across the big‑data middle platform.
In the big data middle platform, metadata management is a cross‑cutting system that spans the entire data lifecycle, from production to consumption, covering big‑data engines and systems. It is a key component of the data middle platform. This talk shares Kuaishou’s challenges and solutions for building a metadata management system and how metadata improves data applications and resource governance.
1. Background of Kuaishou Metadata Management
Metadata describes data, such as schema information of big‑data tables, BI dashboards, datasets, and metric models. Metadata management touches many stages of the data middle platform, including data sync, processing, and services. Kuaishou’s platform has collected dozens of metadata types, including tables, metric models, AB tasks, and analysis dashboards.
Metadata management at Kuaishou has progressed through three stages:
Initial stage . In 2017 the data platform was early‑stage, the warehouse was Hive‑centric, and the metadata platform only integrated Hive with a limited set of metadata types.
Platform 1.0 . In 2019, with richer business development, the platform added engines such as Kafka and Druid, expanded metadata to multiple storage engines, and began offering product capabilities like data discovery.
Platform 2.0 . In 2020, Kuaishou invested heavily in the data middle platform, adding real‑time and batch development, unified scheduling, data services, metric models, and BI analysis. The metadata platform then collected metadata across the full production‑to‑consumption chain and provided rich product capabilities such as lineage analysis and data governance.
2. Metadata Management 2.0 Architecture and Key Technologies
2.1 System Architecture
Kuaishou’s 2.0 platform follows two principles: unification and proactivity .
Unification : Consolidate siloed components into a single platform to improve efficiency.
Proactivity : Build metadata services driven by business value from the outset.
During construction we faced several challenges:
Business complexity : Dozens of heterogeneous entities and many relationship types.
Massive scale : Entity count reaches billions with huge incremental volume.
Collaboration overhead : Cross‑department communication and coordination.
Diverse applications : Varied metadata use cases with high data‑service quality requirements.
To address these, we adopted a “3+1” construction: three unified layers plus one application layer.
Unified Ingestion : A unified pipeline (ingest → parse → process → output) that supports incremental and full‑batch reporting of entities and lineage.
Unified Storage : JanusGraph + Atlas as the primary store, forming a data‑quality assurance system for high consistency.
Unified Service : Consolidated services categorized as unified API, messaging, and data‑warehouse services.
Metadata Application : Build metadata‑driven products such as data discovery, governance, and remediation.
2.2 Key Technologies
Unified Ingestion : A standardised ingestion specification and workflow eliminate siloed development, allowing entities to inherit base attributes and extend with specific ones. ETL processing follows a consistent pipeline, simplifying implementation and improving architectural clarity.
Unified Storage : Over 30 heterogeneous entity types (e.g., Hive tables with columns, offline tasks with distinct attributes) required a flexible schema. We adopted Atlas + JanusGraph, which introduces a type system allowing direct storage of schema definitions. The underlying HBase store handles billions of entities and relationships, while Elasticsearch accelerates high‑performance queries for latency‑sensitive scenarios.
Quality Assurance : Many P0 business scenarios demand strict data quality. The platform ensures quality through:
Entity consistency : Full‑batch repair mechanisms periodically compare external sources and apply tiered repair frequencies, improving consistency for critical entities.
Lineage accuracy : SQL parsing with ANTLR builds table, column, and event lineage. Changes to important lineage are version‑compared, automatically attributed, and blocked if attribution fails.
Lineage Analysis : A global lineage view enables impact analysis (identifying downstream dashboards when upstream tables change) and fault tracing (locating upstream failures when metrics deviate). Two analysis modes are provided:
Simple sync query : Quickly preview immediate upstream/downstream entities via direct graph queries.
Multi‑dimensional async analysis : Handles complex, multi‑condition scenarios using a BFS‑based traversal with early pruning, decoupled from the graph engine, and highly extensible.
Metadata Automatic Tiering : Tiering supports resource‑limited environments by prioritising high‑value entities. The process involves defining baseline entities with authoritative tiers and automatically propagating tiers upward to infer priorities for parent nodes, ensuring P0 entities receive the strongest consistency, timeliness, and availability guarantees.
3. Metadata‑Driven Asset Applications and Data Governance
Metadata can be applied to three main scenarios: data production (e.g., assisting task configuration), data consumption (e.g., query optimisation), and data management (e.g., discovery, governance).
3.1 Data Map
Using full‑metadata, Kuaishou built a data map with three core capabilities:
Searchable : Indexes multiple attributes (basic, profile, usage) and applies weighted matching to achieve ~90% hit rate.
Findable : Provides a hierarchical business catalogue and curated entity profiles (tags, certifications) for “search‑by‑tree” discovery.
Understandable : Supplies basic information and enriched context (production lineage, usage cases, sample data) to help users comprehend discovered assets.
3.2 Asset Management
The asset management platform offers more than 20 capabilities, including:
Manage : Create, update, and delete assets.
Analyze : Provide asset‑level analytics such as inventory, cost analysis, and ranking.
Mine : Perform automated discovery, e.g., infer unassigned asset owners for governance.
3.3 Data Cost Governance
Kuaishou stores several exabytes of data, making cost a major concern. Metadata drives the cost‑governance loop by linking resource billing to governance actions across the data lifecycle (task creation, resource usage, billing, and remediation).
Governance strategies include:
Manual governance : Encourage developers to clean low‑value or high‑cost assets and integrate governance into task creation and resource request processes.
Automatic governance : Automate actions such as lifecycle‑based data deletion, tiered storage recommendations, and hot‑cold data placement based on metadata.
4. Outlook for Metadata Management Platform 3.0
Version 1.0 was largely manual; version 2.0 introduced a unified platform. The upcoming 3.0 focuses on low‑code, automation, and intelligence to create a proactive metadata platform and enable smart data management.
Low‑code & Automation : The system will automatically harvest all kinds of metadata (data lake, metric models, security info) without requiring integration from each source.
Intelligence : Collected metadata forms a “metadata cloud” that can be accessed anytime, supporting intelligent applications such as global data‑schedule optimisation and resource‑aware latency improvements.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data Thinking Notes
Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
