OneData: A Comprehensive Big Data Architecture and Governance Framework
This article presents the OneData methodology for building a robust big‑data platform, detailing background challenges, goals, unified input and output strategies, model design, naming conventions, data‑cleaning rules, and the resulting business benefits and future outlook.
Background
Rapid business growth and frequent cross‑departmental iterations have caused serious data‑quality problems in the data warehouse, including a lack of unified business and technical standards, insufficient quality monitoring, scattered business knowledge, and unclear data architecture.
Goal
Based on the existing big‑data platform and the industry‑proven OneData methodology, the team aims to construct a reasonable data‑system architecture, data specifications, model standards and development patterns to support fast‑changing business and to form a proprietary OneData theory and practice.
OneData Exploration
OneData, originally proposed by Alibaba, defines a comprehensive set of data‑norms, model‑design, ETL standards and supporting tools. The team adapts this by considering implementation cost, tool dependence and the need for a tailored approach.
Our Thinking
Alibaba’s OneData covers a wide range but requires long implementation cycles and high manpower.
Current tools are weak and the existing development process cannot be completely overhauled.
Core Ideas and Characteristics
Core idea: avoid duplicate construction and metric redundancy across design, development, deployment and usage, ensuring unified data definitions and a public data layer.
Core characteristics: three traits—uniformity, uniqueness, standardization—and three effects—high scalability, strong reusability, low cost.
Strategy: Unified Input and Output
Two unified strategies are proposed: unified intake (centralized business knowledge base, standardized model design) and unified output (standardized delivery, data‑asset management).
Unified Business Intake
Establish a global knowledge base to keep business understanding consistent, and design a four‑layer model (ODS → DWD → DWT/DWA → APP) with clear data flow and theme division (business‑oriented and analysis‑oriented).
Model Design
Define model layers, data flow, and theme classification. Enforce rules such as ODS can only be referenced by DWD, avoid circular dependencies, and prefer using existing root words for naming.
Naming Conventions
Table naming rule:
TableName = Type + BusinessSubject + SubSubject + Meaning + StorageFormat + UpdateFrequency + Suffix. Metric naming follows a structured pattern using root words, business modifiers, date modifiers, and aggregation modifiers.
Data Cleaning Standards
Twenty‑four predefined cleaning rules are applied to ensure data quality.
Unified Output
Standardized delivery (five quality attributes) and data‑asset management via the internal “Origin” platform, providing unified metric and dimension management and a single export point for data.
Results
Process improvements, a panoramic data‑warehouse view, asset‑management listings, and measurable business value gains are demonstrated.
Conclusion and Outlook
The OneData framework delivers a stable, reliable data‑warehouse foundation; future plans include real‑time warehousing, expansion to other business domains, and an enterprise‑level One Entity data‑platform based on Data‑as‑a‑Service.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
