Big Data 13 min read

Why Spark on Kubernetes Needs a Remote Shuffle Service—and How It Boosts Performance

This article examines the challenges of running Spark on Kubernetes, introduces the Remote Shuffle Service architecture to overcome shuffle bottlenecks, details EMR on ACK integration, showcases performance gains with Terasort benchmarks, and outlines future cloud‑native big‑data strategies such as mixed‑cluster and serverless deployments.

Alibaba Cloud Developer

Sep 27, 2020

Why Spark on Kubernetes Needs a Remote Shuffle Service—and How It Boosts Performance

Cloud‑Native Background and Motivation

In the era of big data, traditional database middleware such as Oracle can no longer meet the demands of digital transformation. Spark has become a preferred batch‑processing engine, and many enterprises are moving online services to Kubernetes, seeking a unified, complete big‑data infrastructure.

EMR Architecture Overview

The classic EMR solution built on ECS consists of multiple layers:

ECS physical resource layer (IaaS).

Data ingestion layer (e.g., Kafka for streaming, Sqoop for batch).

Storage layer (HDFS, OSS, and EMR’s JindoFS cache).

Compute engine layer (Spark, Presto, Flink, etc.).

Data application layer (DataWorks, PAI, Zeppelin, Jupyter).

These layers form a comprehensive open‑source big‑data stack.

Why Move to Kubernetes

Customers increasingly deploy online services on Kubernetes (ACK in Alibaba Cloud) and want a unified, cloud‑native big‑data platform. Desired characteristics include elastic scaling, storage‑compute separation, and mixed workloads (online, batch, AI) on a single platform.

EMR Compute Engine on ACK

EMR on ACK (Alibaba’s Kubernetes service) packages Spark, Presto, and other engines as CRD + Operator components. Service components (e.g., Hive Metastore) and batch components (e.g., Spark) run as native Kubernetes workloads, integrated with Alibaba Cloud services for logging, monitoring, and governance.

Challenges of Spark on Kubernetes

Shuffle bottleneck: traditional shuffle relies on local disks, preventing dynamic resource scaling and causing heavy I/O.

Scheduling and queue management: ensuring high‑throughput job launches without performance bottlenecks.

Object‑store I/O: reading/writing OSS/HDFS incurs high rename/list overhead and bandwidth limits.

Remote Shuffle Service Solution

The Remote Shuffle Service (RSS) decouples shuffle data from executor disks. Executors write shuffle output over the network to RSS, which merges partitions and stores them in a distributed file system. Reducers then read sequential files, eliminating random I/O and improving stability.

Key benefits:

Network‑based shuffle with storage‑compute separation.

DFS replication (2×) removes fetch‑failed retries, stabilizing heavy‑shuffle jobs.

Sequential reads in the reduce phase avoid random I/O, dramatically boosting performance.

Performance Evaluation

Terasort benchmarks (2 TB, 4 TB, 10 TB) show RSS reduces total runtime, especially in the reduce stage. For 10 TB, reduce time drops from 1.6 h to 1 h, because RSS replaces M×N random I/O with N sequential reads.

Additional Optimizations

Scheduler improvements: moving Spark‑Operator logic into the Spark core to avoid webhook overhead.

OSS performance: JindoFS cache and Jindo Job Committer reduce rename/list costs.

Integration of Delta Lake and TPC‑DS optimizations for better Spark performance.

Future Outlook

Upcoming directions include hybrid clusters (fixed + serverless), spot‑instance scheduling for cost reduction, deeper storage‑compute separation via RSS, and stronger queue‑management capabilities to eliminate perceived performance limits.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance Spark Remote Shuffle Service EMR

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.