Yipay Data Warehouse Construction and Data Governance Practices
This presentation by senior data warehouse engineer Huang Luo details Yipay's end‑to‑end data warehouse build, covering background challenges, governance framework, platform development, layered architecture, naming standards, monitoring, and future plans, offering practical insights for data engineers, architects, and business stakeholders.
Introduction
In the era of information explosion, building a robust data warehouse and implementing effective data governance are critical. Huang Luo, a senior data warehouse engineer at Yipay, shares the experience and best practices of data warehouse construction and governance.
Data Governance Background
Yipay faced several problems in the early stage of data warehouse construction, including code redundancy, unstable task timeliness, severe lack of metadata, high data security risks, and inconsistent data definitions across business lines.
Data Governance Construction Content
Organizational collaboration: establish data governance committees, technical architecture committees, and implementation groups to ensure cross‑department coordination.
Platform construction: build a data development platform supporting offline/online scheduling, data quality monitoring, and a self‑service BI platform for ad‑hoc queries and visualization.
Data application governance: improve data usability, reduce storage and compute costs, and accelerate query performance.
Data standards: define naming conventions, metadata management, and data classification.
Data security: encrypt sensitive data, enforce download approvals, and retain audit logs.
Enterprise‑Level Data Warehouse Construction
The warehouse follows a typical layered architecture: ODS, DWD, DWS, DWM, DM, and APP layers. Each layer has specific responsibilities, from raw data ingestion to aggregated data for various applications.
Key steps include:
Research phase: identify business pain points, investigate organizational needs, and map technical architecture.
Platform support: migrate from Hive to Spark, provide scheduling, monitoring, and self‑service BI capabilities.
Data modeling: adopt dimensional modeling with clear grain, dimensions, and facts.
Standardization: enforce naming rules for tables and fields, and maintain metadata.
Asset management: record table ownership, lifecycle, and partition retention policies.
Monitoring: ensure data quality, timeliness, and security through comprehensive monitoring and SLA enforcement.
Data Governance Effectiveness
Since 2023, Yipay has reduced platform resource consumption by 86%, saved nearly ten million yuan annually, accelerated report generation, and achieved complete metadata completeness, data security compliance, and improved indicator management.
Future Planning
Build a data‑warehouse cockpit for unified monitoring.
Develop an asset management dashboard covering scheduling, storage, resource consumption, and security.
Optimize indicator management to avoid duplicate processing.
Expand data empowerment through tag management, FTP distribution, and data interfaces.
Conclusion
The session provides a comprehensive roadmap for constructing and governing a large‑scale data warehouse, emphasizing systematic planning, platform support, standards, and continuous monitoring.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
