Big Data 20 min read

OneData Methodology: Building a Unified Data Warehouse Architecture and Governance Framework

This article presents the OneData methodology for designing, standardizing, and governing a data warehouse, detailing background challenges, goals, industry references, core concepts, unified business and design consolidation, data modeling layers, naming conventions, data quality controls, and the resulting operational improvements and business value.

Big Data Technology & Architecture

Nov 28, 2021

OneData Methodology: Building a Unified Data Warehouse Architecture and Governance Framework

Background

As business expands, frequent iterations and vertical business units increase, but a lack of early planning has caused serious data quality issues in the data warehouse, posing challenges for data governance. The main problems identified are:

Lack of unified business and technical standards such as development specifications, metric definitions, and delivery standards.

Insufficient unified data quality monitoring, e.g., incomplete column values and unmet SLA timeliness.

Scattered business knowledge leading to divergent understanding among developers and higher product development costs.

Unreasonable data architecture, with unclear division between data layers, missing a consistent foundational layer, and lacking unified dimensions and metric management.

Goal

Based on the existing big data platform and borrowing the mature OneData methodology from the industry, we aim to construct a reasonable data architecture, data standards, model standards, and development patterns to ensure rapid data support for evolving business, drive business growth, and ultimately form our own OneData theory and practice system.

OneData Industry Experience

Alibaba introduced a OneData standard (see Figure 1).

OneData: Our Thinking

We reflected on Alibaba's OneData and our own practice:

1. Thoughts on Alibaba OneData

The OneData system covers a wide range, including data specification definition, data model design, ETL development, and a toolchain supporting the entire methodology.

Implementation cycles are long and require high manpower investment.

Adoption heavily depends on tooling for promotion and rollout.

2. Thoughts on Our Current Situation

Tooling support is weak and manpower is limited.

Existing development processes cannot be completely overhauled.

We concluded that wholesale reuse of external experience is unreasonable and need to define our own OneData that meets goals while avoiding the identified difficulties.

OneData: Our Idea

Combining industry experience, our own stage practice, and past data‑warehouse experience, we pre‑define the core ideas and characteristics of OneData.

Core Idea : From design, development, deployment, and usage perspectives, avoid duplicate and redundant metric construction, ensure consistent metric definitions, and achieve full‑link data asset association, standardized data output, and a unified data public layer.

Core Characteristics : Three traits – uniformity, uniqueness,规范性 (standardization) – and three effects – high scalability, strong reusability, low cost.

OneData: Our Strategy

To realize the core idea while satisfying the core characteristics, we propose two unified strategies: unified intake (business) and unified output.

Based on these strategies, we began the OneData practice.

OneData Practice

Unified Business Intake

Data originates from business and supports its growth; therefore, data‑warehouse builders must be both technical and business experts. We adopt a demand‑driven approach, which brings issues such as data lag and fragmented business knowledge. To address these, we propose a unified business intake and build a global knowledge base to ensure consistent business understanding.

Unified Design Intake

We address pain points by focusing on model and specification construction.

1. Model

Standardized model layering, data flow, and domain division reduce development cost, enhance metric reuse, and improve business support capability.

(1) Model Layering

We define four layers to ensure stable data layers while shielding downstream impact and avoiding overly long pipelines.

(2) Model Data Flow

Before refactoring, there were many siloed developments, inconsistent layer references, chaotic data lineage, and SLA issues. After refactoring, stable business follows the standard flow ODS → DWD → DWA → APP, while exploratory needs may follow ODS → DWD → APP or ODS → DWD → DWT → APP.

Post‑refactor flow rules include:

Normal flow: ODS → DWD → DWT → DWA → APP. If ODS → DWD → DWA → APP appears, the domain is incomplete and DWD data should be moved to DWT.

Avoid using DWD in DWA wide tables together with tables belonging to another domain.

Minimize DWT‑generated tables within the same domain to preserve ETL efficiency.

DWT, DWA, and APP must not directly use ODS tables; ODS can only be referenced by DWD.

Prevent reverse dependencies such as DWT depending on DWA.

2. Domain Partitioning

Traditional industries use abstract domain partitions (e.g., BDWM, FS‑LDM). For internet businesses, we propose two practical partitions: business‑oriented and analysis‑oriented domains.

Business‑oriented: Focus on business processes, transform entity‑relationship models into entity and business‑process models, and define seven core business domains.

Analysis‑oriented: Focus on analytical subjects, creating analysis domains in DWA such as sales analysis, organization analysis, etc.

3. Specification

Modeling is the foundation; specifications guarantee quality. We adopt detailed, actionable specifications to avoid duplicate metrics and poor data quality.

(1) Root Words

Root words are the basis for dimensions and metrics, divided into common roots (e.g., trade) and proprietary roots (e.g., USD).

Common root: Describes the smallest unit of an entity, e.g., transaction‑trade.

Proprietary root: Industry‑specific terms, e.g., USD.

(2) Table Naming Specification

General rules:

Table and column names use an underscore to separate root words (e.g., clienttype → client_type).

All parts are lowercase English words; common fields must meet common field definitions.

Names start with a letter and are no longer than 64 characters.

Prefer existing root keywords; periodically review new names for reasonableness.

Avoid non‑standard abbreviations in custom parts.

Table Naming Rule

表名称 = 类型 + 业务主题 + 子主题 + 表含义 + 存储格式 + 更新频率 + 结尾

(3) Metric Naming Specification

Metrics are structured using root words and modifiers:

Basic metric root: All metrics must contain a basic root.

Business modifier: Describes the business scenario (e.g., trade).

Date modifier: Indicates the time interval.

Aggregation modifier: Indicates aggregation operation.

Basic metric: Business modifier + basic root (e.g., trade_amt).

Derived metric: Multiple modifiers + basic root (e.g., install_poi_cnt).

General metric naming follows field naming rules.

Date‑type metric: Business modifier + basic root + date modifier.

Aggregation metric: Business modifier + basic root + aggregation type + date modifier.

(4) Cleansing Specification

Based on field and metric characteristics, we defined 24 predictable cleansing rules (see Figure 10).

Combining model and specification, we established responsibilities for model design and review (Figure 11).

Unified Application Intake

We discovered siloed application support processes causing documentation divergence, knowledge loss, and high maintenance cost. After refactoring, each application aligns with a single document and a unified knowledge base, reducing knowledge transfer and iteration costs.

The unified intake strategy ensures the three traits and three effects of OneData at the foundational level.

Unified Data Output

Beyond building data content, we focus on delivery quality and usability. We define a five‑aspect delivery standard (Figure 12) and implement data asset management via the internal "Origin Data Platform" (Figure 13).

Through standardized delivery and asset management, we ensure data quality, consistency, and ease of use, forming the core of OneData's metric management.

Practice Results

Process Improvement : Refined requirement analysis, metric management, model design, and data validation, aligning with OneData strategies to improve warehouse management.

Data Warehouse Panorama : Using business‑oriented and analysis‑oriented strategies, we built a comprehensive warehouse view (Figure 15).

Asset Management List : The Origin platform generates an asset management list (Figure 16).

Project Benefits : Comparing pre‑ and post‑OneData implementations demonstrates significant value gains (Figures 17).

Summary and Outlook

By integrating OneData's core ideas and traits, we built a stable, reliable foundational data warehouse that guarantees data quality and supports continuous business decision‑making. Future work includes introducing real‑time data warehouses for low‑latency needs, expanding to additional business domains, and evolving toward an enterprise‑level One Entity data platform (Data‑as‑a‑Service) to further strengthen data support and asset value.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Data Modeling Data Warehouse Data Governance Onedata data standards

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.