Big Data 11 min read

Replacing Classic Data Warehouse with a One‑Layer Wide Table Model: Architecture, Benefits, and Challenges

The article proposes replacing the traditional multi‑layered data‑warehouse architecture (ODS‑DWD‑DWS‑ADS) with a single, column‑store wide‑table per business theme, achieving roughly 30 % storage savings and faster queries, while acknowledging higher ETL complexity, back‑tracking costs, and production timing challenges.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
Replacing Classic Data Warehouse with a One‑Layer Wide Table Model: Architecture, Benefits, and Challenges

1. Business Background

Internet products iterate quickly, producing massive data across many product lines. Traditional data warehouses adopt a classic layered model (ODS → DWD → DWS → ADS) to support BI analysis.

1.1 Current Data‑Modeling Situation

Multiple product lines generate continuous streams of data for analysts, product managers, operations, and data engineers. The classic layered approach separates data into ODS, DWD, DWS, and ADS layers.

1.2 Business Characteristics and Trends

Rapid product iteration, increasing cross‑business analysis, and a shift of data consumers from engineers to product personnel demand lower usage barriers.

2. Problems Faced

The layered model leads to:

2.1 Issues Under the Data‑Driven Trend

Redundant storage, unclear data definitions, and low query performance.

2.2 Reflections

After user interviews, three main ideas emerged:

Reduce overall warehouse storage by using a single wide table per business theme.

Standardize table usage and definitions to lower communication costs.

Accelerate queries to support data‑driven business.

3. Technical Solution

A one‑layer large wide‑table model replaces the classic dimensional model to address storage redundancy, table proliferation, and query latency.

3.1 Wide‑Table Model Replaces Classic Dimensional Model

3.1.1 Architecture

The wide table sits directly in the warehouse layer, eliminating the ODS→DWD→DWS→ADS chain. Reports and ad‑hoc queries can read the wide table directly.

3.1.2 Construction Plan

Logs are grouped by business theme; each theme receives a wide table that contains all required fields (detail, dimension, metric) to support reporting and ad‑hoc analysis.

3.1.3 Underlying Principles

• Parquet columnar storage supports hundreds of columns and provides high compression and I/O efficiency. • Fact tables are flattened and joined with dimension tables to produce the wide table, which contains three column groups: common attributes, business dimensions, and metrics.

3.1.4 Advantages and Performance

Storage Reduction : A single wide table replaces multiple layers, cutting total warehouse storage by ~30%.

Query Speed : Columnar storage and predicate push‑down make simple aggregations faster than classic layered tables.

Scenario 1 : With comparable storage, wide tables outperform classic tables on simple queries. Scenario 2 : Complex calculations that require explode() in classic tables can be handled by simple count/sum on the wide table because metrics are pre‑computed. Scenario 3 : When storage differs significantly, wide‑table performance may degrade slightly but remains acceptable.

3.1.5 Challenges

• Development Cost: Wide tables encapsulate extensive ETL logic, increasing code complexity. • Back‑tracking Cost: Re‑processing large wide tables for historical data consumes more resources. • Production Timeliness: As upstream sources become ready at different times, the wide table may suffer from “bucket effect”.

Potential mitigations include partitioned wide tables, selective back‑tracking of required fields, leveraging idle night‑time resources, upstream batch integration, and staged data production.

4. Summary and Outlook

• Wide‑table modeling suits fast‑iteration, data‑driven businesses and improves efficiency. • Compared with classic warehouses, wide tables offer better storage usage and query performance. • Construction introduces higher production and maintenance costs that need continuous optimization.

Future plans: Use wide tables to build self‑service analytics platforms, further boosting business analysis efficiency.

Big DataData WarehouseStorage Optimizationetlquery performancewide tableparquet
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.