Big Data 14 min read

Yipay Data Warehouse Construction and Data Governance Practices

This presentation by senior data warehouse engineer Huang Luo details Yipay's end‑to‑end data warehouse build, covering background challenges, governance framework, platform development, layered architecture, naming standards, monitoring, and future plans, offering practical insights for data engineers, architects, and business stakeholders.

DataFunSummit

Feb 19, 2024

Yipay Data Warehouse Construction and Data Governance Practices

Introduction

In the era of information explosion, building a robust data warehouse and implementing effective data governance are critical. Huang Luo, a senior data warehouse engineer at Yipay, shares the experience and best practices of data warehouse construction and governance.

Data Governance Background

Yipay faced several problems in the early stage of data warehouse construction, including code redundancy, unstable task timeliness, severe lack of metadata, high data security risks, and inconsistent data definitions across business lines.

Data Governance Construction Content

Organizational collaboration: establish data governance committees, technical architecture committees, and implementation groups to ensure cross‑department coordination.

Platform construction: build a data development platform supporting offline/online scheduling, data quality monitoring, and a self‑service BI platform for ad‑hoc queries and visualization.

Data application governance: improve data usability, reduce storage and compute costs, and accelerate query performance.

Data standards: define naming conventions, metadata management, and data classification.

Data security: encrypt sensitive data, enforce download approvals, and retain audit logs.

Enterprise‑Level Data Warehouse Construction

The warehouse follows a typical layered architecture: ODS, DWD, DWS, DWM, DM, and APP layers. Each layer has specific responsibilities, from raw data ingestion to aggregated data for various applications.

Key steps include:

Research phase: identify business pain points, investigate organizational needs, and map technical architecture.

Platform support: migrate from Hive to Spark, provide scheduling, monitoring, and self‑service BI capabilities.

Data modeling: adopt dimensional modeling with clear grain, dimensions, and facts.

Standardization: enforce naming rules for tables and fields, and maintain metadata.

Asset management: record table ownership, lifecycle, and partition retention policies.

Monitoring: ensure data quality, timeliness, and security through comprehensive monitoring and SLA enforcement.

Data Governance Effectiveness

Since 2023, Yipay has reduced platform resource consumption by 86%, saved nearly ten million yuan annually, accelerated report generation, and achieved complete metadata completeness, data security compliance, and improved indicator management.

Future Planning

Build a data‑warehouse cockpit for unified monitoring.

Develop an asset management dashboard covering scheduling, storage, resource consumption, and security.

Optimize indicator management to avoid duplicate processing.

Expand data empowerment through tag management, FTP distribution, and data interfaces.

Conclusion

The session provides a comprehensive roadmap for constructing and governing a large‑scale data warehouse, emphasizing systematic planning, platform support, standards, and continuous monitoring.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Data Quality Data Warehouse Data Architecture

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.