Big Data 23 min read

How Kuaishou Built a Standardized Data Governance Evaluation System

This article explains Kuaishou's comprehensive approach to data governance, detailing the background challenges, a standardized evaluation framework, practical solutions for model, quality, and cost issues, scoring strategies, operational mechanisms, measurable benefits, and future plans for continuous improvement.

Kuaishou Big Data
Kuaishou Big Data
Kuaishou Big Data
How Kuaishou Built a Standardized Data Governance Evaluation System

1. Data Governance Background

Kuaishou has become a large platform processing massive data daily, relying heavily on data for operations and analysis. Growing data volume and diverse application scenarios have introduced challenges such as high cost and data quality issues, necessitating targeted governance to simplify data usage and increase its value.

The governance process faces four main challenges: long data pipelines requiring standardization, measuring governance effectiveness, dynamic focus across different stages, and motivating teams under business pressure.

2. Data Governance Evaluation System

The system addresses the four challenges by defining overall goals, a comprehensive assessment framework, operational mechanisms, and methods to measure benefits after implementation.

Overall Plan Goals and Implementation Strategies

Standardize and categorize governance issues using measurable dimensions such as asset health scores.

Quantify governance through scoring linked to processes.

Adjust strategy weights based on problem focus at different stages.

Provide tools and mechanisms to ensure effective rollout.

Implementation Strategies

Metadata‑driven governance.

Asset health scoring across five dimensions: model, quality, cost, service, and security.

Weight‑adjusted scoring to visualize asset health.

Operational mechanisms to sustain governance.

Score‑based incentives to ensure benefits.

Issues in the Model Layer

Lack of usable data due to poor construction and organization.

Inconsistent standards for shared data.

Low efficiency in business data retrieval.

Kuaishou's Solution

The solution focuses on three goals: rich and complete data assets, easy data discovery, and efficient data usage.

Standardized model construction and production.

Unified data requirements and domain‑driven design.

Standardized metric definitions and model reviews.

During production, tools enforce standards, perform code checks, and ensure full test coverage.

Task publishing checks (review, baseline, dependencies).

Monitoring standards for different asset tiers.

Evaluation Dimensions of the Data Model

Standardization : Covers definition, design, development, and release, ensuring models are searchable and usable.

Reusability : Measured by downstream dependency count and model width.

Completeness : Assessed via cross‑layer reference rates and query coverage in different scenarios.

Asset Measurement: Data Quality

Quality issues arise across the entire data chain, often due to lack of quality awareness, missing monitoring, and absent standardized processes.

Quality problems span production, processing, and service stages.

Insufficient quality awareness leads to reactive fixes.

Missing standard procedures cause repeated, inefficient issue resolution.

Solutions address three layers: source, processing, and online service, focusing on standard production, pre‑emptive monitoring, tool‑based validation, and accurate alerting.

Cost Control Plan

Kuaishou reduces cost through big‑data engine optimization, data asset tiering, storage lifecycle policies, duplicate model detection, and targeted performance tuning.

Define asset tiers (A1, A2, A3) with distinct storage and lifecycle strategies.

Detect and merge duplicate models.

Optimize large‑scale tasks for resource efficiency.

Manage high‑volume dimension tables with extreme storage policies.

Quota mechanisms and cost‑bill notifications raise awareness, while governance leaderboards incentivize improvements.

Scoring Strategy Issues and Solutions

Scoring must reconcile different metric dimensions, provide feedback loops, and adapt to stage‑specific priorities. Methods include Max‑Min normalization, percentile fitting, and coefficient of variation to dynamically adjust weights.

Operational Mechanisms

Governance awareness through regular training and promotion.

Periodic operations drive governance via score‑based leaderboards and incentives.

Combined soft and hard mechanisms, such as restricting production permissions for low‑scoring assets.

Quantified Evaluation Benefits

Benefits are measured across four dimensions:

Cost savings from reduced storage and compute usage.

Quality improvements reflected in fewer incidents and faster timeliness.

Human‑efficiency gains via platform‑enabled automation.

Value increase measured by business satisfaction surveys.

Kuaishou Practice Effects

After governance, the overall data warehouse health index rose from 58 to 77, with over 95% participation from data engineers. Reported gains include:

Cost savings in storage and compute.

Improved efficiency through platform‑driven governance.

Quality improvements with a >45% reduction in incidents.

Higher business satisfaction with data services.

Governance Architecture

The architecture integrates metadata (assets, processing, quality, service) with the standardized evaluation framework to identify issues and apply targeted remediation, all unified in a single governance platform.

Future Plans

Future work focuses on:

Pre‑emptive governance by embedding standards and tooling directly into production.

Increasing governance efficiency through one‑click, platform‑wide actions that close the governance loop.

cost optimizationKuaishou
Kuaishou Big Data
Written by

Kuaishou Big Data

Technology sharing on Kuaishou Big Data, covering big‑data architectures (Hadoop, Spark, Flink, ClickHouse, etc.), data middle‑platform (development, management, services, analytics tools) and data warehouses. Also includes the latest tech updates, big‑data job listings, and information on meetups, talks, and conferences.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.