Replacing Classic Data Warehouse with a One‑Layer Wide Table Model: Architecture, Benefits, and Challenges
The article proposes replacing the traditional multi‑layered data‑warehouse architecture (ODS‑DWD‑DWS‑ADS) with a single, column‑store wide‑table per business theme, achieving roughly 30 % storage savings and faster queries, while acknowledging higher ETL complexity, back‑tracking costs, and production timing challenges.
1. Business Background
Internet products iterate quickly, producing massive data across many product lines. Traditional data warehouses adopt a classic layered model (ODS → DWD → DWS → ADS) to support BI analysis.
1.1 Current Data‑Modeling Situation
Multiple product lines generate continuous streams of data for analysts, product managers, operations, and data engineers. The classic layered approach separates data into ODS, DWD, DWS, and ADS layers.
1.2 Business Characteristics and Trends
Rapid product iteration, increasing cross‑business analysis, and a shift of data consumers from engineers to product personnel demand lower usage barriers.
2. Problems Faced
The layered model leads to:
2.1 Issues Under the Data‑Driven Trend
Redundant storage, unclear data definitions, and low query performance.
2.2 Reflections
After user interviews, three main ideas emerged:
Reduce overall warehouse storage by using a single wide table per business theme.
Standardize table usage and definitions to lower communication costs.
Accelerate queries to support data‑driven business.
3. Technical Solution
A one‑layer large wide‑table model replaces the classic dimensional model to address storage redundancy, table proliferation, and query latency.
3.1 Wide‑Table Model Replaces Classic Dimensional Model
3.1.1 Architecture
The wide table sits directly in the warehouse layer, eliminating the ODS→DWD→DWS→ADS chain. Reports and ad‑hoc queries can read the wide table directly.
3.1.2 Construction Plan
Logs are grouped by business theme; each theme receives a wide table that contains all required fields (detail, dimension, metric) to support reporting and ad‑hoc analysis.
3.1.3 Underlying Principles
• Parquet columnar storage supports hundreds of columns and provides high compression and I/O efficiency. • Fact tables are flattened and joined with dimension tables to produce the wide table, which contains three column groups: common attributes, business dimensions, and metrics.
3.1.4 Advantages and Performance
Storage Reduction : A single wide table replaces multiple layers, cutting total warehouse storage by ~30%.
Query Speed : Columnar storage and predicate push‑down make simple aggregations faster than classic layered tables.
Scenario 1 : With comparable storage, wide tables outperform classic tables on simple queries. Scenario 2 : Complex calculations that require explode() in classic tables can be handled by simple count/sum on the wide table because metrics are pre‑computed. Scenario 3 : When storage differs significantly, wide‑table performance may degrade slightly but remains acceptable.
3.1.5 Challenges
• Development Cost: Wide tables encapsulate extensive ETL logic, increasing code complexity. • Back‑tracking Cost: Re‑processing large wide tables for historical data consumes more resources. • Production Timeliness: As upstream sources become ready at different times, the wide table may suffer from “bucket effect”.
Potential mitigations include partitioned wide tables, selective back‑tracking of required fields, leveraging idle night‑time resources, upstream batch integration, and staged data production.
4. Summary and Outlook
• Wide‑table modeling suits fast‑iteration, data‑driven businesses and improves efficiency. • Compared with classic warehouses, wide tables offer better storage usage and query performance. • Construction introduces higher production and maintenance costs that need continuous optimization.
Future plans: Use wide tables to build self‑service analytics platforms, further boosting business analysis efficiency.
Baidu Geek Talk
Follow us to discover more Baidu tech insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.