Big Data 17 min read

JD Remote Shuffle Service: Design, Implementation, and Performance Evaluation

This article presents JD's self‑developed Remote Shuffle Service for Spark, detailing its architecture, goals, implementation details, performance benchmarks, and real‑world production case studies that demonstrate its impact on shuffle efficiency and system stability in large‑scale data processing.

JD Tech
JD Tech
JD Tech
JD Remote Shuffle Service: Design, Implementation, and Performance Evaluation

The JD Spark engine team introduced the Remote Shuffle Service (RSS) to address performance and stability challenges of the shuffle phase in large‑scale data processing, aiming to improve the Spark engine for high‑traffic scenarios.

Existing External Shuffle Service suffers from unclear architecture, low resource utilization, lack of overall management, and poor stability, prompting the need for a better solution.

RSS objectives include compute‑storage separation, elasticity and high availability, specific feature optimizations such as Adaptive Execution, data‑skew handling, fallback mechanisms, comprehensive monitoring, and cloud‑native support for Kubernetes deployments.

In the RSS architecture, Map tasks send shuffle data directly to RSS nodes, which aggregate and store the data in distributed file systems like HDFS or CFS; Reduce tasks then read the aggregated files. The design incorporates data backup via distributed storage, deduplication using block‑ID headers, health checks and load balancing through ETCD, and monitoring via Spark's Metrics System, Prometheus, and Grafana dashboards.

Performance tests on TPC‑DS show RSS is roughly 4% slower than the External Shuffle Service on average, comparable to Uber's RSS implementation, but the gap narrows in large‑scale workloads where RSS significantly reduces FetchFailedException rates and improves overall stability.

A production case during JD's major promotion demonstrates RSS eliminating FetchFailedException failures, cutting task runtime from over five hours to about two hours, stabilizing execution, and saving compute resources.

In summary, RSS has been deployed at scale, processing over 1000 TB of shuffle data daily, and future plans involve integrating emerging research, expanding deployment, supporting Spark 3.0 and Kubernetes, extending to multi‑engine scenarios, and pursuing open‑source and commercial opportunities.

distributed systemsperformanceBig DataSparkRemote Shuffle ServiceetcdShuffle Optimization
JD Tech
Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.