Tag

Sort Shuffle

1 views collected around this technical thread.

IT Services Circle
IT Services Circle
Mar 21, 2022 · Big Data

Understanding Spark Shuffle: Hash, Sort, and Tungsten Sort Mechanisms

This article explains the evolution and inner workings of Spark's shuffle phase, comparing the original Hash‑based shuffle, the default Sort‑based shuffle, the optimized Tungsten‑Sort shuffle, and related configuration options that affect performance and file handling in large‑scale data processing.

Big DataHash ShuffleShuffle
0 likes · 17 min read
Understanding Spark Shuffle: Hash, Sort, and Tungsten Sort Mechanisms
Big Data Technology Architecture
Big Data Technology Architecture
Nov 15, 2021 · Big Data

Flink Sort‑Shuffle: Design, Implementation, and Performance Evaluation

This article explains how Flink's new sort‑shuffle mechanism improves large‑scale batch processing by reducing file counts, optimizing I/O, lowering memory usage, and delivering up to tenfold speedups, while also detailing configuration tips and future enhancements.

Batch ProcessingBig DataData Shuffle
0 likes · 16 min read
Flink Sort‑Shuffle: Design, Implementation, and Performance Evaluation
Big Data Technology Architecture
Big Data Technology Architecture
Apr 28, 2020 · Big Data

Understanding Shuffle in Hadoop MapReduce and Spark

This article explains the concept and workflow of shuffle in Hadoop MapReduce and Spark, covering map‑side buffering, spill and merge, reduce‑side copy‑merge‑reduce, the reasons for sorting and file merging, and compares Hash‑Shuffle and Sort‑Shuffle implementations with performance considerations.

Big DataHash ShuffleMapReduce
0 likes · 16 min read
Understanding Shuffle in Hadoop MapReduce and Spark