Big Data 15 min read

Design and Implementation of a Data Warehouse Evaluation System for Governance and Performance

This article presents the motivation, design principles, architecture, metric system, and results of a data‑warehouse evaluation framework that quantifies efficiency, quality, cost, and model health to drive systematic governance and continuous improvement across the organization.

Zhuanzhuan Tech
Zhuanzhuan Tech
Zhuanzhuan Tech
Design and Implementation of a Data Warehouse Evaluation System for Governance and Performance

1 Introduction

This sharing focuses on systematic evaluation and governance of data‑warehouse construction and business delivery, addressing efficiency, quality, cost, and development issues, with an emphasis on design ideas and practical outcomes rather than deep technical details.

2 Background

2.1 Why Build a Data Warehouse Evaluation System

数仓评估体系起源于23年底,作为转转数据治理的评价层,是非常重要的一环。 Historically, the data‑warehouse team served fast‑changing business needs, leading to siloed builds, coarse metric management, and unnecessary compute and storage costs. Managers, developers, and external users all struggle to see overall warehouse health, cost growth, data‑usage efficiency, and governance compliance.

Manager perspective:

Cannot see overall warehouse construction status.

Cannot see cost growth and distribution.

Low data‑usage efficiency and quality improvement.

Unclear impact of development standards.

Data‑warehouse RD perspective:

Which assets have problems and what are they?

Model completeness and weak business processes.

Model reuse rate and unused models.

Which standards are not being followed.

External user perspective:

Difficulty finding and using data, low efficiency.

Frequent accuracy, timeliness, and consistency issues.

Hard to control report, email, and broadcast permissions.

To address these problems, a yearly data‑governance project was launched in 2024, establishing standards, strengthening infrastructure, and building a metric library and warehouse map. The evaluation system serves as a post‑mortem tool throughout the governance process. 数仓评估体系作为事后的评价和复盘工具,则是贯穿整体治理过程。

The evaluation system is positioned as an objective, quantitative analysis of warehouse status and issues, enabling horizontal and vertical comparisons and driving governance based on result metrics and governance items. 基于客观事实进行数仓现状及问题的量化分析及横纵向对比,实现基于结果指标和治理项驱动的数据治理及效果回收。

2.2 Technical Problems to Solve

✓ Design and implementation of evaluation metrics ✓ Acquisition and processing of "objective fact" data ✓ Governance of existing (stock) issues ✓ Control of incremental issues ✓ Continuous improvement and robustness of the evaluation system

3 Design Approach

The system consists of two main modules: result evaluation and process evaluation. Result evaluation monitors efficiency, quality, cost, and model‑related outcome metrics, while process evaluation drives developers to address specific governance items.

3.1 Overall Architecture

The evaluation framework is divided into three layers:

Metric layer: Processes data‑model outputs into result and process metrics, producing metric tables for trend and comparative analysis.

Data processing layer: Cleans and models raw data from the source layer.

Data source layer: Integrates Hive metadata, query logs, alarm logs, billing data, governance rules, and organizational structures.

3.2 Product Form Design

The framework is organized into result evaluation and process evaluation modules, each with its own metric set. Result evaluation observes current status and trends at an overall level; process evaluation drives daily governance, targeting specific asset issues.

Result Evaluation

Displays efficiency, quality, cost, and model metrics, supporting horizontal and vertical quantitative analysis and drill‑down by organization, component, or individual. Data sources include analyst and product‑operation query logs, data‑quality monitoring logs, and manually labeled incidents.

Process Evaluation

Abstracts daily problems and standards into governance items. Currently includes four categories: efficiency, quality, development, and cost, aggregated into a governance workbench that drives developers to resolve issues.

3.3 Metric System Construction

The metric system has two layers: governance items (process) and result metrics (outcome). Governance items are fine‑grained checks derived from daily standards; the first phase defines 24 items across efficiency, quality, development, and cost. 治理项 is the smallest granularity of data governance.

结果指标 aggregates issues into high‑level results for manager, RD, and user perspectives. Example categories include:

Efficiency: Composite index based on average runtime, ODS penetration, and complex‑SQL proportion.

Quality: Number of online issues.

Cost: Overall growth rate, component‑level distribution, and per‑business/person cost breakdown.

Model: Completeness, reuse rate, stability, and compliance.

4 Result Presentation

4.1 Result Evaluation Module

Efficiency Evaluation

Provides horizontal and vertical comparison of the efficiency score, supporting drill‑down to business, individual, or specific query level. Enhancements in model completeness and efficiency tools help improve daily query performance.

Quality Evaluation

Shows trend analysis of online quality issues. Currently based on manual reporting, with plans to incorporate monitoring and alarm data for more objective assessment.

Cost Evaluation

Focuses on monthly cost growth trends and aims to reduce cost through systematic task and storage governance. Current emphasis is on internal model‑related cost control.

Cost breakdown by component, organization, and individual:

Model Evaluation

Tracks completeness, reuse, stability, and compliance metrics to observe governance outcomes. The workbench drives developers to address specific governance items.

4.2 Process Evaluation Module

This module lists detailed asset issues for each developer; clicking an asset name opens the specific problem items. Filters allow focusing on particular governance items.

Asset issue details:

Governance reports show target achievement and weekly/monthly progress, with new asset monitoring for incremental issue control.

4.3 Phase Governance Benefits

In the past year, the evaluation system has driven improvements in model completeness, reuse, stability, and compliance, laying a solid foundation for future efficiency and quality governance. 磨刀不误砍柴工,勤修炼内功,为后续的效率和质量治理奠定基础。

Governance gains

Completeness increased from 50.2% to 93.97% (↑43.77%)

Reuse rate increased from 51.65% to 88.61% (↑36.96%)

Stability increased from 76.27% to 94.45% (↑18.18%)

Compliance increased from 76.56% to 94.61% (↑18.05%)

Core changes for the warehouse

Model completeness drove a 2.6‑fold increase (161% growth) in newly launched models, improving reuse and reducing storage by decommissioning unused models. Overall completeness reached 95%, meaning 95% of internal demand can be satisfied by the model layer, up from ~50% previously. Long‑standing issues such as cross‑layer penetration, back‑reference, external table reference, missing comments, and missing dependencies have been largely resolved. 跨层穿透问题、回流引用问题、引用外部门表问题、注释缺失问题、缺依赖问题等存量问题基本治理完毕并得到了控制。

5 Future Plans

Regular retrospectives to iteratively refine governance items for more accurate and comprehensive evaluation.

Gradually shift focus from internal to external efficiency and quality improvements.

Identify cost‑wasteful business units or individuals to drive cost‑reduction and efficiency gains.

Extend the framework beyond the data warehouse to broader data‑asset evaluation and governance.

About the author

Qiu Difan, Data Development Engineer at ZhaiZhai, lead of C2 & New Media Data Warehouse, primary owner of data‑governance warehouse.

For more ZhaiZhai business practices, follow the public account below.

performanceBig DataMetricsData WarehouseData Governanceevaluation system
Zhuanzhuan Tech
Written by

Zhuanzhuan Tech

A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.