Comprehensive Overview of Data Middle Platform Architecture and Its Core Frameworks
This article provides a detailed overview of data middle platform concepts, describing a decoupled six‑subsystem architecture—including storage, collection, processing, governance, security, and operation frameworks—while illustrating typical enterprise implementations, industry‑specific solutions, and best‑practice considerations for building scalable, secure, and value‑driven data platforms.
Introduction – Modern enterprises are moving away from siloed, end‑to‑end data pipelines toward centralized data collection, storage, and layered application development, enabling rapid deployment and unified data governance.
Why Data Middle Platforms Matter – Originating from Alibaba’s “big middle platform, small front‑end” concept, data middle platforms bridge the gap between fast‑moving data development and slower application development, improving responsiveness and data asset utilization.
General Architecture – The universal data middle platform architecture consists of six loosely coupled subsystems that can be built and evolved independently: data storage, data collection, data processing, data governance, data security, and data operation.
1. Data Storage Framework – Centralizes raw, structured, and unstructured data using object storage, block storage, or databases, and manages metadata, tags, master data, and wide‑table representations for downstream use.
2. Data Collection Framework – Provides unified ingestion methods (FTP, database sync, API, streaming, web crawling) and pre‑processes source data to remove noise before handing it to downstream subsystems.
3. Data Processing Framework – Implements ETL, batch and stream processing, AI analysis, data cleaning, and task scheduling, offering a centralized environment for data transformation and model building.
4. Data Governance Framework – Covers data catalog, management, model management, and quality control, while deliberately excluding security and sharing functions to avoid conflicts of interest.
5. Data Security Framework – Overlays the entire platform with logging, authentication, authorization, and encryption modules to protect data at every stage.
6. Data Operation Framework – Supplies portals, capability exposure (APIs/micro‑services), data opening, and operational monitoring to deliver data services to internal and external consumers while ensuring stability and security.
Core Functional Layers – The platform further defines data aggregation (ingesting disparate sources), data development (providing tools for developers and analysts), data asset system (organizing data as reusable assets), asset management (making assets understandable to business users), data service system (exposing assets as services), and the combined operation & security layer that guarantees long‑term health.
Industry‑Specific Architectures – Examples include technical middle platforms, banking data architectures, retail middle platforms, business middle platforms, real‑time data middle platforms, and various sector‑specific solutions (e.g., real‑estate, securities, manufacturing, media, legal).
Conclusion – Building a data middle platform creates a unified data asset pool, enhances data‑driven decision making, and resolves the speed mismatch between data and application development, though detailed design of each subsystem (storage technology, security compliance, model design) remains essential.
Source: CDO Research Society (reproduced with permission).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
