How Kubernetes Powers Cloud‑Native Big Data with EMR on ACK
This article explains the shift of big data and machine‑learning workloads toward storage‑compute separation and cloud‑native architectures, outlines the technical challenges of running Spark on Kubernetes, and details the EMR on ACK solution with its architecture, performance gains, and real‑world adoption.
Overview of Cloud‑Native Big Data on Kubernetes
Big‑data and machine‑learning workloads are moving toward a storage‑compute separation model and adopting cloud‑native platforms. Spark, for example, can run on traditional Hadoop schedulers in on‑premise environments, while on public clouds it must exploit elastic resources, centralized operations, and object‑storage services. This shift has driven many Spark‑on‑Kubernetes deployments.
Technical Challenges of Cloud‑Native Big Data
Building an HDFS‑compatible file system on Alibaba Cloud Object Storage (OSS) that matches HDFS performance while lowering cost.
Separating shuffle data from compute nodes and supporting heterogeneous Alibaba Cloud Container Service (ACK) node types.
Enabling Spark dynamic resource allocation (e.g., Spark‑25299) in a cloud‑native context.
Kubernetes‑Based Scheduling Optimizations
After introducing Kubernetes, the focus is on eliminating performance bottlenecks to achieve Yarn‑level throughput, implementing multi‑level queue management, and using peak‑valley scheduling to shift workloads to off‑peak periods for higher cluster utilization.
EMR 2.0 on ACK Architecture
In December, Alibaba Cloud released EMR 2.0, which can be deployed directly on the ACK platform. This decouples big‑data job execution from underlying cluster management, allowing users to concentrate on data processing logic. Open‑source engines such as Spark, Presto, and Flink run on ACK with full compatibility and performance that exceeds upstream versions.
Key Architectural Features
Lightweight control plane that integrates with existing data platforms.
Job submission from data‑development or scheduling clusters to multiple execution back‑ends.
Off‑peak (peak‑valley) scheduling based on business load patterns.
Cloud‑native data‑lake architecture leveraging ACK’s elastic scaling.
ACK manages heterogeneous node types, providing flexible resource mixes.
Performance‑Focused Advantages
Remote Shuffle Service : Provides storage‑compute separation for intermediate shuffle data, allowing compute nodes to run without local or cloud disks.
Spark Dynamic Resource Allocation : Fully supports Spark‑25299 for on‑the‑fly executor scaling.
JindoFS Acceleration : Optimizes OSS access; Block mode delivers >15% performance gain on a 1 TB TPC‑DS benchmark.
Scheduler Framework V2 : Improves scheduling throughput by >3× compared with the community scheduler and adds multi‑level queue management.
Engine Enhancements : EMR Spark achieves 3× higher throughput than the open‑source version on a 10 TB TPC‑DS benchmark; Hudi and DeltaLake receive functional and performance upgrades.
Comprehensive Off‑Peak Scheduling : Enables coordinated batch and streaming jobs to share the same ACK cluster, increasing overall machine utilization.
Real‑World Deployment Example
The advertising technology provider Huami (汇量科技) has operated EMR for four years. After upgrading to EMR 2.0, the company observed multiple‑fold improvements in data synchronization and query latency for its material platform and heat‑engine services, along with higher system stability and the elimination of previous CPU, memory, and I/O bottlenecks.
Reference
For detailed documentation, see
https://help.aliyun.com/document_detail/280450.htmlSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
