Big Data 15 min read

Core Technologies, Performance Metrics, Challenges, and Future Trends of Cloud‑Native Big Data – Expert Interview

In this expert interview, a chief big‑data architect from NetEase explains the core technology layers, key performance indicators, major challenges and mitigation strategies, the business value, and emerging trends of cloud‑native big data platforms, highlighting scheduling, storage, and mixed‑deployment considerations.

DataFunSummit

Mar 28, 2023

Core Technologies, Performance Metrics, Challenges, and Future Trends of Cloud‑Native Big Data – Expert Interview

Cloud‑native big data represents a new generation of data platforms that are deployed, scheduled, and stored in a cloud‑native manner, supporting multiple compute workloads with elastic scheduling and higher storage efficiency, fundamentally changing how big data is used and operated.

The interview is structured around five questions: the most critical technology components, core performance metrics, the biggest challenges and their solutions, the value of combining cloud‑native and big‑data components, and future directions or trends.

The expert identifies three main technology blocks: (1) the application layer—containers, micro‑services, service mesh, and API gateways; (2) the storage layer—storage pooling with block and object storage; (3) the compute layer—containerized compute and storage pooling, which separates compute from storage and requires changes such as compute containerization and storage pooling.

Key performance indicators include the scheduling layer’s ability to handle high concurrency, the adaptation of big‑data systems (e.g., Spark, Flink) to cloud‑native environments, and the capacity of storage‑compute separation to sustain short‑term high‑throughput data reads.

Major challenges are: (a) redesigning the scheduler to cope with high‑concurrency demands; (b) compatibility issues for big‑data systems in cloud‑native settings, such as log handling and the need for a remote shuffle service (RSS) to replace local shuffle mechanisms; (c) the impact of storage‑compute separation on component performance, addressed through caching solutions like Alluxio and optimized object‑storage interfaces (e.g., EMRFS, JindoFS).

The value of cloud‑native big data lies in its elasticity and the ability to mix online (real‑time) and offline (batch) workloads, improving resource utilization by allowing workloads with opposite peak times to share the same infrastructure, thereby reducing hardware costs.

Future trends highlighted include mixed deployment of online and offline tasks, development of specialized high‑performance schedulers, and smarter resource allocation that leverages intelligent operations to balance competing demands from different business types.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Scheduling storage big-data

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.