Snowball Data Warehouse Modeling and OneData System Implementation
This article outlines Snowball's data warehouse background, compares major modeling approaches such as ER, dimensional, DataVault and Anchor models, describes the current challenges of their dimensional model, and details the OneData methodology—including OneModel, OneID, and OneService—along with its practical implementation, results, and future plans.
1. Background
After years of building the Snowball data warehouse, it now supports various business data needs. With the integration of community, fund, and stock businesses, requirements for data accuracy and timeliness have increased, leading to a larger warehouse. The data team has been optimizing historical data, encountering issues like metric ambiguity, duplicate development, and inefficiency, which demand a comprehensive, long‑term solution.
2. Data Warehouse Modeling
Modeling is the foundation of data warehouse implementation, affecting development difficulty, cycle, and user convenience. Common modeling methods include:
1. ER Entity‑Relationship Model
The ER model abstracts transactions into entities, attributes, and relationships, adhering to 3NF for low redundancy and high consistency, but queries can be complex and less efficient in big‑data scenarios.
2. Dimensional Modeling
Advocated by Ralph Kimball, this model separates tables into fact and dimension tables, supporting star and snowflake schemas, and is oriented toward analytical scenarios.
3. DataVault Modeling
Data Vault, created by Dan Linstedt, emphasizes an atomic, traceable historical data layer composed of Hubs, Links, and Satellites. It facilitates configurable ETL but is primarily designed for data integration rather than analytical decision‑making.
4. Anchor Modeling
Anchor modeling consists of Anchors, Attributes, Ties, and Knots, extending DataVault concepts with 6NF normalization. While it improves extensibility, it increases join complexity and is less common in practice.
The mainstream modeling methods are ER and dimensional modeling, each with distinct advantages and drawbacks.
Modeling Method
Advantages
Disadvantages
Applicable Scenario
ER Entity Model
Low data redundancy
Easy consistency
Requires comprehensive business analysis
Long implementation cycle
High skill requirement for modelers
OLTP
Dimensional Modeling
No need for complete business process mapping
Quick demo implementation
Clear, business‑friendly structure
Facilitates OLAP analysis
Heavy ETL preprocessing
Hard to adapt to changing business definitions
Potential data source inconsistency
Higher data redundancy
Rigid granularity reduces reusability
OLAP
3. Snowball Data Warehouse Current Situation
Snowball uses dimensional modeling, which enables rapid response and clear structure but suffers from metric ambiguity, inconsistent definitions, duplicate development, and high data redundancy, affecting both development efficiency and resource consumption.
Metric Ambiguity: Multiple interpretations of a metric lead to confusion.
Same Metric, Different Values: Identical metric names yield different results across tables.
Duplicate Metric Development: Redundant metric implementations waste storage and compute resources.
Unquantified Data Cost/Benefit: The warehouse acts as a cost center with benefits hard to measure.
4. OneData System
1. Concept
OneData builds on dimensional modeling to unify requirement analysis, model design, and data delivery. It comprises OneModel (standardized data layer), OneID (entity identification), and OneService (data service layer).
2. OneModel
OneModel defines unified data domains, processes, and metric standards before development, ensuring consistent naming, calculation, and metadata management.
Data model standardization covers metadata such as field names, types, and lengths.
Business metric standardization defines unified metric names, definitions, and calculations.
3. OneID
OneID provides a unified identifier (UID) for users, devices, etc., breaking data silos and enabling cross‑domain data linkage.
4. OneService
OneService delivers data via unified interfaces, focusing on data reuse rather than duplication, addressing challenges such as heterogeneous data sources, duplicate services, lack of traceability, and complex mappings.
Key principle: Data reuse instead of copying.
5. OneData Implementation
The implementation follows four phases: requirement analysis & review, model design & review, ETL development & testing, and data release & usage.
1. Requirement Analysis & Review
Analysts gather business needs, verify existing data, or create detailed requirement documents for new metrics, breaking down metrics into data domain, business process, time period, modifier type, modifier, and atomic metric.
Examples of data domains: Fund, Stock, Community, each with specific business processes.
2. Model Design & Review
Snowball's warehouse follows a four‑layer architecture: ODS, DWD, DWS, and ADS, plus a DIM layer for dimension tables.
ODS: Raw data from source systems, mirroring source schemas.
DWD: Processed tables aligned with business processes.
DWS: Summarized data for analytics, supporting both light and heavy aggregations.
ADS: Application‑specific data for reports and dashboards.
3. ETL Development & Testing
Snowball's development platform "Luban" supports OneData by auto‑generating metadata, enforcing naming conventions, and recording metadata for downstream use.
Metadata is captured for easy discovery via the "Tianyan" system.
4. Data Release & Usage
Upon release, lineage analysis adds dependencies, and the metadata system collects table attributes, partitions, and lineage for user access.
6. Implementation Results
After adopting OneData, BI performance improved significantly: duplicate calculations were reduced, metric definitions unified, and query latency decreased by about 30%, leading to more stable and faster data delivery for analysts and management.
7. Conclusion
The dimensional model, while initially effective, faced issues such as metric duplication and unclear modeling, prompting the shift to OneData. Standardized development tools and processes now improve efficiency, enforce governance, and turn data into a valuable asset.
8. Future Plans
1. Data Map
Develop a comprehensive, visual data map to improve discoverability and reduce user effort.
2. Model Evaluation System
Establish a scoring mechanism to proactively assess and refine models before business changes demand re‑engineering.
3. Table Lifecycle Management
Implement automated identification of obsolete tables linked to retired business processes to manage storage and compute resources efficiently.
Snowball Engineer Team
Proactivity, efficiency, professionalism, and empathy are the core values of the Snowball Engineer Team; curiosity, passion, and sharing of technology drive their continuous progress.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
