Tagged articles

Sort-Shuffle

4 articles · Page 1 of 1

Mar 21, 2022 · Big Data

Understanding Spark Shuffle: Hash, Sort, and Tungsten Sort Mechanisms

This article explains the evolution and inner workings of Spark's shuffle phase, comparing the original Hash‑based shuffle, the default Sort‑based shuffle, the optimized Tungsten‑Sort shuffle, and related configuration options that affect performance and file handling in large‑scale data processing.

Hash ShuffleShuffleSort-Shuffle

0 likes · 17 min read

Understanding Spark Shuffle: Hash, Sort, and Tungsten Sort Mechanisms

Alibaba Cloud Developer

Nov 22, 2021 · Big Data

How Flink’s Sort‑Shuffle Boosts Large‑Scale Batch Processing Performance

This article explains how Flink’s new Sort‑Shuffle mechanism improves stability and performance for massive batch jobs by reducing file counts, optimizing I/O, minimizing memory usage, and providing detailed implementation, test results, tuning tips, and future enhancements.

Batch ProcessingData ShuffleFlink

0 likes · 17 min read

How Flink’s Sort‑Shuffle Boosts Large‑Scale Batch Processing Performance

Big Data Technology Architecture

Nov 15, 2021 · Big Data

Flink Sort‑Shuffle: Design, Implementation, and Performance Evaluation

This article explains how Flink's new sort‑shuffle mechanism improves large‑scale batch processing by reducing file counts, optimizing I/O, lowering memory usage, and delivering up to tenfold speedups, while also detailing configuration tips and future enhancements.

Batch ProcessingData ShuffleFlink

0 likes · 16 min read

Flink Sort‑Shuffle: Design, Implementation, and Performance Evaluation

Big Data Technology Architecture

Apr 28, 2020 · Big Data

Understanding Shuffle in Hadoop MapReduce and Spark

This article explains the concept and workflow of shuffle in Hadoop MapReduce and Spark, covering map‑side buffering, spill and merge, reduce‑side copy‑merge‑reduce, the reasons for sorting and file merging, and compares Hash‑Shuffle and Sort‑Shuffle implementations with performance considerations.

Hash ShuffleShuffleSort-Shuffle

0 likes · 16 min read

Understanding Shuffle in Hadoop MapReduce and Spark