Big Data 13 min read

Building a Data Warehouse: Architecture, ETL, Layering, Modeling, and Governance

This article explains how to build a data warehouse from scratch, covering its definition, system and collaboration layers, ETL requirements, data layering design, modeling steps, common challenges, and governance practices such as temporary table management and coding standards.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Building a Data Warehouse: Architecture, ETL, Layering, Modeling, and Governance

Data warehouse (DW) is defined by Inmon as a subject‑oriented, integrated, time‑variant, non‑volatile collection of data supporting enterprise decision making.

System layer considerations include ETL (offline + real‑time), data layering (Stage, ODS, DIM/DW, DA), data integration, standardization, monitoring, service orientation, and real‑time DW.

Collaboration layer emphasizes coordination with backend developers, business analysts, and iterative updates to keep the DW aligned with evolving business needs.

Personal responsibilities cover data integration, accuracy, workflow simplification, communication, and skill development such as query optimization and storage efficiency.

ETL requirements: coverage of diverse data sources (RDBMS, NoSQL, files, APIs), performance under peak loads, and extensibility for schema changes and metadata management.

Typical ETL tools include DataX, Sqoop, Kettle, Informatica, and they should ensure data continuity, scheduled execution, and comprehensive job metadata.

Challenges include handling source schema changes, load errors, and ensuring low‑cost extensibility.

Layering aims to separate concerns like a house: Stage (buffer), ODS (raw), DIM/DW (model), DA (application), each with clear boundaries to achieve high cohesion and low coupling.

Design principles: clear hierarchy, explicit functionality, no internal dependencies; common layer structure illustrated with ODS, DIM/DW, etc.

Problems in layering: historical data reconstruction, performance bottlenecks, and layer redesign.

Model construction is the core of DW, providing integrated, historical, subject‑oriented views to support decision making and business process improvement.

Modeling steps: business model → domain model → logical model → physical model, each with specific activities such as defining concepts, abstracting entities, mapping to tables, and tuning for specific platforms.

Modeling methods include Inmon, Kimball (dimensional), and Data Vault; the article focuses on dimensional modeling.

Implementation steps: select business process, define grain, choose dimensions, identify facts, then build dimension tables and fact tables.

Governance includes temporary table management (naming conventions, lifecycle), code standards (script headers, naming), and process standards to reduce errors.

Key discussion points are captured as code snippets: 建设数仓, 工具, 面临的问题, 分层的出发点, 分层设计, 为什么要建设模型, 怎么建设模型, 理清工作思路, 实施步骤, 建模方法及实施, 临时表管理, 代码规范, 流程规范.

— THE END —

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big Datadata modelingData Warehouselayered architectureETLData Governance
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.