Kuaishou's Metric Middle Platform (Gaia) and OneService: Architecture, Implementation, and Future Roadmap
This article details Kuashou's construction of a unified metric middle‑platform called Gaia, the OneMetric and OneService systems for standardized metric management and service, their architectural design, key technologies, achieved outcomes, and plans for expanding data services across the company.
Kuaishou faces common big‑data challenges where metrics—business data measurements—are fragmented, inconsistently defined, and lack unified export mechanisms, leading to high management costs, duplicate metrics, and unreliable processes.
To address these issues, Kuaishou built a metric middle‑platform (code‑named Gaia) that standardizes metric definitions, governance, and service delivery across the organization, forming the foundation of its enterprise data‑middle‑platform.
1. Problems and Solutions The existing data‑service architecture, built on offline data warehouses with numerous applications (ABTest, reporting, analysis), suffers from non‑uniform metric management, inconsistent definitions, and disjointed processes. Industry practice suggests a unified metric management system, but many implementations lack comprehensive metadata, quality assurance, lineage, and global change propagation.
2. Metric Standardization (OneMetric) Kuaishou introduced the OneMetric system to manage metric definitions centrally, enforce naming conventions, and ensure consistency. It includes modules for metadata management, model construction, consistency checking (real‑time and routine), model search, and ranking. The system builds a metadata layer linking logical and physical definitions, constructs dimensional models, and provides search capabilities to locate optimal models for a given metric‑dimension query.
3. Metric Service Platform (OneService) OneService offers a unified query interface that translates metric‑dimension requests into engine‑specific execution plans. It provides language translation (DSL to engine SQL), secondary calculations (e.g., period‑over‑period), and orchestrates tasks across heterogeneous engines (Hive, Druid, HBase). The architecture separates an engine layer (plan generation, scheduling, OneSQL translation) from a service layer (metadata and query APIs).
Key Technologies
Language abstraction: a DSL composed of data set, time range, metric, dimension, and filter, illustrated by the example 电商本周各地域母婴产品支付订单金额 and its structured decomposition:
电商:数据范围
本周:时间范围 = [20201019 – 20201025]
各地域:维度 = [地域]
母婴产品:过滤条件 = [商品二级类目 = 母婴产品]
支付订单金额:指标 = [支付订单金额]Execution plan: a DAG that first performs semantic splitting (job → tasks per metric/dimension) and then engine splitting (task → queries per underlying engine).
OneSQL: converts the unified AST of a data model into engine‑specific SQL, applying rule‑based optimizations (RBO) before generation.
4. Achievements and Future Plans Within a year, Kuaishou delivered OneMetric and OneService, migrating all analytics, dashboards, and reporting to metric‑dimension queries, supporting multiple engines, achieving minute‑level offline and sub‑second online query latency, and serving core businesses such as the main site, e‑commerce, and games. Future work includes expanding to more business lines, building a comprehensive OneService data‑as‑a‑service platform, developing a OneDSL for data services, introducing OneCache for performance, and exploring automation and intelligence in data governance.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.