Big Data 23 min read

Snowball Data Warehouse Modeling and OneData System Implementation

This article outlines Snowball's data warehouse background, compares major modeling approaches such as ER, dimensional, DataVault and Anchor models, describes the current challenges of their dimensional model, and details the OneData methodology—including OneModel, OneID, and OneService—along with its practical implementation, results, and future plans.

Snowball Engineer Team

Aug 5, 2022

Snowball Data Warehouse Modeling and OneData System Implementation

1. Background

After years of building the Snowball data warehouse, it now supports various business data needs. With the integration of community, fund, and stock businesses, requirements for data accuracy and timeliness have increased, leading to a larger warehouse. The data team has been optimizing historical data, encountering issues like metric ambiguity, duplicate development, and inefficiency, which demand a comprehensive, long‑term solution.

2. Data Warehouse Modeling

Modeling is the foundation of data warehouse implementation, affecting development difficulty, cycle, and user convenience. Common modeling methods include:

1. ER Entity‑Relationship Model

The ER model abstracts transactions into entities, attributes, and relationships, adhering to 3NF for low redundancy and high consistency, but queries can be complex and less efficient in big‑data scenarios.

2. Dimensional Modeling

Advocated by Ralph Kimball, this model separates tables into fact and dimension tables, supporting star and snowflake schemas, and is oriented toward analytical scenarios.

3. DataVault Modeling

Data Vault, created by Dan Linstedt, emphasizes an atomic, traceable historical data layer composed of Hubs, Links, and Satellites. It facilitates configurable ETL but is primarily designed for data integration rather than analytical decision‑making.

4. Anchor Modeling

Anchor modeling consists of Anchors, Attributes, Ties, and Knots, extending DataVault concepts with 6NF normalization. While it improves extensibility, it increases join complexity and is less common in practice.

The mainstream modeling methods are ER and dimensional modeling, each with distinct advantages and drawbacks.

Modeling Method

Advantages

Disadvantages

Applicable Scenario

ER Entity Model

Low data redundancy

Easy consistency

Requires comprehensive business analysis

Long implementation cycle

High skill requirement for modelers

OLTP

Dimensional Modeling

No need for complete business process mapping

Quick demo implementation

Clear, business‑friendly structure

Facilitates OLAP analysis

Heavy ETL preprocessing

Hard to adapt to changing business definitions

Potential data source inconsistency

Higher data redundancy

Rigid granularity reduces reusability

OLAP

3. Snowball Data Warehouse Current Situation

Snowball uses dimensional modeling, which enables rapid response and clear structure but suffers from metric ambiguity, inconsistent definitions, duplicate development, and high data redundancy, affecting both development efficiency and resource consumption.

Metric Ambiguity: Multiple interpretations of a metric lead to confusion.

Same Metric, Different Values: Identical metric names yield different results across tables.

Duplicate Metric Development: Redundant metric implementations waste storage and compute resources.

Unquantified Data Cost/Benefit: The warehouse acts as a cost center with benefits hard to measure.

4. OneData System

1. Concept

OneData builds on dimensional modeling to unify requirement analysis, model design, and data delivery. It comprises OneModel (standardized data layer), OneID (entity identification), and OneService (data service layer).

2. OneModel

OneModel defines unified data domains, processes, and metric standards before development, ensuring consistent naming, calculation, and metadata management.

Data model standardization covers metadata such as field names, types, and lengths.

Business metric standardization defines unified metric names, definitions, and calculations.

3. OneID

OneID provides a unified identifier (UID) for users, devices, etc., breaking data silos and enabling cross‑domain data linkage.

4. OneService

OneService delivers data via unified interfaces, focusing on data reuse rather than duplication, addressing challenges such as heterogeneous data sources, duplicate services, lack of traceability, and complex mappings.

Key principle: Data reuse instead of copying.

5. OneData Implementation

The implementation follows four phases: requirement analysis & review, model design & review, ETL development & testing, and data release & usage.

1. Requirement Analysis & Review

Analysts gather business needs, verify existing data, or create detailed requirement documents for new metrics, breaking down metrics into data domain, business process, time period, modifier type, modifier, and atomic metric.

Examples of data domains: Fund, Stock, Community, each with specific business processes.

2. Model Design & Review

Snowball's warehouse follows a four‑layer architecture: ODS, DWD, DWS, and ADS, plus a DIM layer for dimension tables.

ODS: Raw data from source systems, mirroring source schemas.

DWD: Processed tables aligned with business processes.

DWS: Summarized data for analytics, supporting both light and heavy aggregations.

ADS: Application‑specific data for reports and dashboards.

3. ETL Development & Testing

Snowball's development platform "Luban" supports OneData by auto‑generating metadata, enforcing naming conventions, and recording metadata for downstream use.

Metadata is captured for easy discovery via the "Tianyan" system.

4. Data Release & Usage

Upon release, lineage analysis adds dependencies, and the metadata system collects table attributes, partitions, and lineage for user access.

6. Implementation Results

After adopting OneData, BI performance improved significantly: duplicate calculations were reduced, metric definitions unified, and query latency decreased by about 30%, leading to more stable and faster data delivery for analysts and management.

7. Conclusion

The dimensional model, while initially effective, faced issues such as metric duplication and unclear modeling, prompting the shift to OneData. Standardized development tools and processes now improve efficiency, enforce governance, and turn data into a valuable asset.

8. Future Plans

1. Data Map

Develop a comprehensive, visual data map to improve discoverability and reduce user effort.

2. Model Evaluation System

Establish a scoring mechanism to proactively assess and refine models before business changes demand re‑engineering.

3. Table Lifecycle Management

Implement automated identification of obsolete tables linked to retired business processes to manage storage and compute resources efficiently.

Big Data Modeling Metrics Data Warehouse ETL Data Governance Onedata

Written by

Snowball Engineer Team

Proactivity, efficiency, professionalism, and empathy are the core values of the Snowball Engineer Team; curiosity, passion, and sharing of technology drive their continuous progress.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.