Tagged articles

Shuffle Service

6 articles · Page 1 of 1

Dec 31, 2023 · Big Data

Apache Celeborn (Incubating): Addressing Traditional Shuffle Limitations in Big Data Processing

Apache Celeborn (Incubating) is a remote shuffle service designed to overcome the inefficiencies, high storage demands, network overhead, and limited fault tolerance of traditional Spark shuffle implementations by introducing push‑shuffle, partition splitting, columnar shuffle, multi‑layer storage, and elastic, stable, and scalable architectures.

Apache SparkBig DataPerformance Optimization

0 likes · 15 min read

Apache Celeborn (Incubating): Addressing Traditional Shuffle Limitations in Big Data Processing

DataFunTalk

Dec 2, 2023 · Big Data

Apache Celeborn: Overview, Architecture, Community, and Future Roadmap

This article introduces Apache Celeborn, explains the challenges of intermediate data in large‑scale compute engines, details its core architecture and design—including master, worker, lifecycle manager and shuffle client—covers its community history, version releases, performance comparisons with Spark ESS, real‑world deployment scenarios, and outlines future development plans.

Apache CelebornBig DataFlink

0 likes · 14 min read

Apache Celeborn: Overview, Architecture, Community, and Future Roadmap

DataFunTalk

Aug 5, 2023 · Big Data

Apache Celeborn (Incubating): Design, Performance, Stability, and Elasticity of a Remote Shuffle Service

This article reviews the limitations of traditional Spark shuffle, introduces Apache Celeborn (Incubating) as a remote shuffle service, and details its design for performance, stability, and elasticity, including push shuffle, partition splitting, columnar shuffle, multi‑layer storage, congestion control, and real‑world evaluation.

Apache SparkBig DataPerformance

0 likes · 19 min read

Apache Celeborn (Incubating): Design, Performance, Stability, and Elasticity of a Remote Shuffle Service

DataFunTalk

May 3, 2023 · Big Data

Shuttle2.0: Enhancing Spark and Flink Shuffle with Distributed Sorting and Adaptive Broadcast

Shuttle2.0 extends OPPO's open‑source high‑availability Spark Remote Shuffle Service to support Flink, introduces a unified stream‑batch data model, pipelines shuffle with distributed sorting, and provides an Adaptive BroadcastJoin solution that dramatically improves performance and stability for large‑scale big‑data workloads.

Adaptive BroadcastBig DataDistributed Sorting

0 likes · 11 min read

Shuttle2.0: Enhancing Spark and Flink Shuffle with Distributed Sorting and Adaptive Broadcast

ByteDance Cloud Native

Sep 2, 2022 · Big Data

How ByteDance’s Cloud Shuffle Service Boosts Big Data Job Stability and Performance

ByteDance’s Cloud Shuffle Service (CSS) replaces the traditional Pull‑Based Sort Shuffle in Spark, FlinkBatch and MapReduce with a Push‑Based remote shuffle that improves stability, performance and elasticity, supports compute‑storage separation, and delivers significant speedups in large‑scale TPC‑DS benchmarks.

Performance OptimizationRemote ShuffleShuffle Service

0 likes · 11 min read

How ByteDance’s Cloud Shuffle Service Boosts Big Data Job Stability and Performance

Alibaba Cloud Developer

Jan 2, 2020 · Big Data

How Alibaba’s MaxCompute Tackled Double‑11’s EB‑Scale Data with Fuxi 2.0 and StreamlineX

In 2019 Alibaba’s MaxCompute processed near‑exabyte daily data during Double 11, using the newly released Fuxi 2.0 scheduler, StreamlineX + Shuffle Service, and the upgraded DAG 2.0 engine to overcome massive throughput, resource‑allocation, and fault‑tolerance challenges while achieving significant performance and stability gains.

DAG 2.0FuxiMaxCompute

0 likes · 28 min read

How Alibaba’s MaxCompute Tackled Double‑11’s EB‑Scale Data with Fuxi 2.0 and StreamlineX