Big Data 28 min read

Data Intelligence Expert Interview – Maturity, Trends, and Practices of Data Middle Platforms

The interview gathers insights from data‑platform experts on the maturity stages, technology trends, implementation methodologies, open‑source ecosystems, system architectures, governance, security, and assessment criteria of modern data middle platforms, offering a comprehensive guide for practitioners.

DataFunSummit

Jan 4, 2023

Data Intelligence Expert Interview – Maturity, Trends, and Practices of Data Middle Platforms

DataFun interviewed senior data‑platform engineers to discuss the current focus, challenges, and future directions of data middle platforms, helping readers grasp key technical priorities and improve their own implementations.

1. Technical Maturity Stages

Mature Phase: Offline and real‑time processing pipelines, dominated by Spark and Flink ecosystems.

Hot Phase: LakeHouse technologies (Iceberg, Hudi, Delta Lake) and OLAP engines such as Kylin, Druid, ClickHouse, Doris.

Growth Phase: Data security and governance, still in early, rule‑based stages.

Forward Phase: Data observability, increasingly combined with machine‑learning capabilities.

2. Implementation Methodology

Fast‑growing companies need higher‑level toolchains and focus on data quality and timeliness.

Companies with saturated data growth shift attention to governance and security.

Large enterprises tend to build custom middle‑platforms and later migrate to their own cloud services; SMEs either self‑build or adopt cloud‑vendor solutions.

3. Open‑Source Ecosystem

While open‑source has long been led by foreign organizations, Chinese companies have recently contributed projects such as Apache InLong, SkyWalking, DolphinScheduler, and many commercial tools are emerging, though market maturity still varies.

4. Technical System

Data Integration & Modeling: Emphasis on ETL/Reverse‑ETL, with tools like Airbyte, Fivetran, dbt, and Apache Airflow, DolphinScheduler, etc.

Offline Development: Common stacks include MySQL, MongoDB, Redis, DataX, BitSell, Kafka, RocketMQ, Airflow, Azkaban, Git/SVN for code management, and Apache Griffin for data quality (limited adoption).

Real‑time Development: Flink, Spark Streaming, Storm for compute; Kafka for messaging; ClickHouse, Doris/StarRocks, Druid, HBase, Kudu, data lakes for storage; Impala/Presto for querying.

5. Data System

Real‑time storage options include ClickHouse (single‑table queries), Doris/StarRocks (supports upsert and multi‑table joins), and Druid (time‑series aggregation). Data lakes (Iceberg, Hudi, Delta Lake) provide table‑format abstraction for upsert, partitioning, and schema evolution.

6. Service System

BI dashboards and reports.

OLAP ad‑hoc queries (HUE, Zeppelin, Impala, Presto, ClickHouse, Doris).

Data products (AB‑testing, user‑profile, DMP, recommendation platforms).

Data‑as‑a‑service APIs built with SpringBoot, backed by MySQL, MongoDB, HBase, Redis.

7. Operation System

Focuses on data availability (accuracy, completeness, consistency, timeliness), usability (clear data definitions, metadata, data maps, indicator systems), and security (data classification, permission approval, audit trails).

8. Security Management

Data classification and tiered access control.

Permission approval workflows.

Audit logging and compliance with regulations such as the Data Security Law.

9. Maturity Assessment

Evaluation criteria include breadth (number of business lines using the platform and variety of services) and depth (extent to which services support business needs, from simple reporting to real‑time strategy optimization and intelligent analytics).

Overall, the interview provides a detailed roadmap for building, operating, and evolving a data middle platform in today’s rapidly changing big‑data landscape.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Data Platform Open-source Data Governance Data Observability

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.