Understanding Data Middle Platform: Concepts, Architecture, and Real‑Time Implementation
The article explains the data middle platform concept, its distinction from traditional big‑data platforms, the architectural principles behind Alibaba's implementation, and how real‑time ingestion, processing, and service layers enable efficient, collaborative, and scalable data-driven applications.
Data middle platform is hailed as the next step for big data, originating from Alibaba’s "big‑middle platform, small front‑end" strategy in 2015 and revived by Tencent in 2018.
Although many talk about it, the term is often misunderstood; it is not a platform or a system that can be bought, but rather a middle‑layer concept that bridges data development and application development.
Data Middle Platform Is Not a Big Data Platform!
It is not a product; it is a technical middle layer. Using Gartner’s Pace Layer model helps clarify its role: core data models change slowly, while business demands evolve rapidly, creating a mismatch that the middle platform aims to resolve.
Key challenges addressed include:
Efficiency : Reducing the long lead time for adding reports or real‑time recommendations.
Collaboration : Avoiding duplicated data development across teams.
Capability : Providing specialized data engineering resources for data‑centric tasks.
The solution aggregates and governs cross‑domain data, exposing it as services (Data API) rather than raw databases, thus decoupling front‑end development speed from back‑end data changes.
Alibaba’s Data Middle Platform Details
Business‑wide Data Landscape
Data is collected from various business lines (e.g., Taobao, Tmall, Hema) into a unified "OneData" layer, forming public data centers for consumer, enterprise, and content domains, which are then processed and served via the "OneService" middleware.
Three Core Systems
Alibaba’s cloud data middle platform is built on three pillars:
OneData : Standardizes data as assets.
OneEntity : Unifies entities to eliminate data silos.
OneService : Provides reusable data services.
Six Data‑Technology Domains
The platform originally defined six domains: data model, storage governance, data quality, security & permission, platform operation, and R&D engineering. Over time, these evolved into broader areas such as data asset management and data trust, with ongoing work in model and quality domains and emerging intelligent black‑box capabilities.
How to Build a Real‑Time Data Middle Platform
The following logical architecture illustrates a real‑time implementation, emphasizing the real‑time model layer.
1. Real‑Time Ingestion
Different data types use appropriate ingestion methods; Flume + Kafka is the default, alongside file and database connectors.
2. Computing Framework
A Kappa architecture enables unified batch and stream processing, leveraging Flink for high‑throughput, low‑latency, and seamless batch‑stream integration.
3. Real‑Time Model
Similar to data‑warehouse models, real‑time models are business‑oriented and consist of DWD (standardized, filtered data) and DW layers, which include dynamic, event, and time‑series models, each stored in suitable systems (Kafka/HBase, MQ/Redis, HBase/TSDB).
4. Real‑Time Service
A unified data‑development platform provides graphical, workflow‑driven tools to manage both offline and real‑time data, avoiding isolated stream‑processing scripts and reducing development overhead.
5. Real‑Time Application
By supporting rapid orchestration, development cycles shrink from weeks to days, delivering high‑impact real‑time services; Alibaba processes EB‑scale data, handling 94 million events per second during peak events with end‑to‑end latency of 2.5 seconds.
Overall, the growing demand for real‑time data capabilities makes building a real‑time data middle platform essential for modern enterprises.
Author: 数据分析不是个事儿 https://www.jianshu.com/p/05a8db84e454
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.