Unlocking Enterprise Value with a Data Middle Platform: Architecture & Indicators
This article traces the evolution from traditional data warehouses to modern data lakes and data middle platforms, explains why siloed data development hampers efficiency, and details the architecture and indicator‑library design used by Zhengcaiyun to achieve unified, reusable data services.
Background of Data Platforms
Data warehouses emerged in the 1990s to turn enterprise data into knowledge for business intelligence. Bill Inmon defined a data warehouse as a subject‑oriented, integrated, time‑variant, non‑volatile collection of data supporting management decisions. This classic definition still guides today’s large‑scale analytics.
Technological Shifts
The advent of Hadoop (post‑2003) built on Google’s seminal papers— The Google File System , MapReduce , and Bigtable —which introduced a distributed, scalable approach to massive heterogeneous data processing. Hadoop decouples storage from compute and relaxes schema constraints, enabling flexible analysis of varied data sources.
Data Lake and Data Lakehouse
In 2010, James Dixon coined the term “Data Lake” to describe a repository that stores raw data in its original format. While a data lake can handle semi‑structured and unstructured data for deep analysis, it does not replace a warehouse, which excels at structured, high‑performance reporting. The emerging “Data Lakehouse” combines the strengths of both.
Rise of the Data Middle Platform
As big‑data platforms grew, data developers faced a cumbersome workflow: ingest data, develop transformations, validate results, publish jobs, and maintain daily operations. Without an efficient platform, development speed plummets, akin to coding with a plain text editor instead of an IDE. The data middle platform concept was introduced to streamline this end‑to‑end pipeline, lower development barriers, and enable rapid, large‑scale data processing.
Key challenges that motivated the middle platform include:
Isolated, “chimney‑style” development causing fragmented data across business lines.
Inconsistent metric calculations leading to mistrust in analytics.
Redundant metric implementations that waste storage and compute resources.
Zhengcaiyun’s Data Middle Platform Architecture
Zhengcaiyun follows a “One Data, One Service” philosophy. “iData” implements the “One Data” layer, while “Datapi” delivers the “One Service” capability. The platform sits atop Hadoop‑based infrastructure, covering compute, resource scheduling, and storage, and supports five core scenarios: data integration, development, testing, publishing, and operation.
iData Indicator Library
Before iData, analysts struggled with unclear metrics, frequent requirement changes, and high verification costs. Developers also faced chaotic metric naming, duplicated implementations, and inconsistent data exports. iData addresses these issues by standardizing dimension, metric naming, and calculation logic.
The indicator model consists of several concepts:
Business Process : Indivisible events such as purchase, click, or view.
Data Domain : A tightly related collection of business subjects, analogous to folders on a desktop.
Dimension : Attributes describing a business entity (who, where, when, what), e.g., geographic or temporal dimensions.
Dimension Attribute : Concrete values within a dimension, such as gender or country code.
Modifier & Modifier Type : Additional qualifiers (e.g., APP vs. PC) that refine a metric’s scope.
Indicator Types : Atomic, derived, and composite indicators.
Atomic Indicator represents a single, indivisible measurement (e.g., order count, transaction amount). Derived Indicator combines a time period, optional modifiers, and one atomic metric. It splits into:
Transactional Indicator : Measures ongoing business actions (e.g., page views, order payment).
Stock Indicator : Counts static entities at a point in time (e.g., total registered users).
Composite Indicator aggregates multiple atomic or derived metrics using defined formulas (e.g., CTR, average view time).
Naming conventions enforce clarity:
Atomic metrics use “action + measure” in Chinese and concise abbreviations in English.
Derived metrics follow “time period + granularity + modifier + atomic metric” in Chinese and “modifier_atomicMetric_timePeriod” in English.
Complex calculations may require pseudo‑code or SQL snippets, which are stored alongside the metric definition for transparency. After establishing the indicator library, a web‑based portal allows users to browse, search, and understand each metric’s definition and calculation, eliminating ambiguity and boosting development efficiency.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
政采云技术
ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
