Mafengwo’s Data Warehouse & Middle Platform: Architecture, Modeling, Toolchain
This article details Mafengwo’s journey in constructing a data warehouse and data middle platform, covering the core three‑layer architecture, hybrid modeling approaches, the supporting toolchain for data synchronization, scheduling, and metadata management, and the design of an indicator platform for business analytics.
Part 1: Data Warehouse and Data Middle Platform
In recent years the concept of a data middle platform has remained popular, and since 2018 Mafengwo has been exploring its own data middle platform. The data middle platform is essentially a combination of a traditional data warehouse and a big‑data platform, integrating and unifying data through a component‑based approach, enabling flexible data processing and rapid business response.
The core architecture is illustrated below:
Before building the middle platform, Mafengwo had already established a big‑data platform and accumulated reusable, componentized tools that support rapid middle‑platform construction. The data warehouse, as another core part, focuses on data unification, including a unified data model and metric system.
Part 2: Data Warehouse Core Architecture
Mafengwo’s data warehouse follows a standard three‑layer architecture, using dimensional modeling for data layering. The overall structure is shown below:
The three layers are:
Business Data Layer : Consists of STG (staging) and ODS (operational) layers, mirroring source data structures.
Public Data Layer : Includes DWD (detail), DWS (summary), and DIM (dimension) layers, providing integrated business process data and shared dimensions.
Application Data Layer : DWA layer for product‑specific data processing such as commercialization, search recommendation, and risk control.
Part 3: Data Model Design
The data model abstracts real‑world data characteristics. Two classic methodologies are Inmon’s normalized (entity‑relationship) modeling and Kimball’s dimensional modeling. Mafengwo adopts a hybrid, demand‑driven approach, selecting models per layer based on four criteria:
Topic‑oriented: Classify business data by themes using normalized concepts.
Consistency: Use bus‑architecture dimensions and fact tables to ensure uniformity.
Data quality: Combine both methodologies to guarantee data accuracy.
Efficiency: Apply degenerated dimensions, slowly changing dimensions, and redundancy to improve query performance.
The ODS layer retains a normalized model with lineage (slowly changing) handling. DWD and DWS adopt dimensional and wide‑table models, extending dimensions and embedding metrics for ease of use and query speed.
Horizontal integration merges data from multiple sources into a single model, while vertical integration consolidates data across business process stages into a comprehensive wide table, exemplified by the order transaction model.
Part 4: Data Warehouse Toolchain
To boost data productivity, Mafengwo built a toolchain covering three major tools:
Data Synchronization Tool : Handles data extraction (incremental or full) from source systems, transformation, and loading into the ODS layer, supporting both raw and lineage storage.
Task Scheduling Platform : Uses Airflow together with a custom scheduler to manage regular jobs, data re‑runs, and historical backfills, featuring a “one‑click re‑run” that can delete or virtually execute downstream tasks.
Metadata Management Tool : Manages technical, business, and governance metadata, providing data lineage tracking and knowledge documentation for tables, columns, and metrics.
Part 5: Data Warehouse Application – Indicator Platform
The indicator platform consumes data from the warehouse to provide business‑ready metrics. Its design follows four principles: clear definitions, transparent production, fast query response, and flexible permission control. The platform architecture includes data source, indicator management, dictionary, data service, multi‑dimensional query, and permission modules.
Part 6: Summary
Enterprise data construction typically progresses through three stages: business dataization, data intelligence, and data‑driven business. Most companies are still at the intelligence stage, requiring solid foundations before achieving data‑driven growth. Mafengwo’s data middle platform is in its early phase, emphasizing data standardization, componentization, and clear organization, with the unified data warehouse as a core focus.
Data originates from business processes and ultimately serves them; aligning data tightly with business needs maximizes its value. At Mafengwo, 75% of employees use data products, driving continuous improvement of the data platform.
Mafengwo Technology
External communication platform of the Mafengwo Technology team, regularly sharing articles on advanced tech practices, tech exchange events, and recruitment.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
