Tag

ShuffleManager

0 views collected around this technical thread.

Architecture Digest
Architecture Digest
May 25, 2016 · Big Data

Advanced Spark Performance Optimization: Data Skew and Shuffle Tuning

This article provides a comprehensive guide on tackling Spark performance bottlenecks by diagnosing data skew, locating the offending stages and operators, and applying a range of practical solutions—including Hive pre‑processing, key filtering, shuffle parallelism, two‑stage aggregation, map‑join, and combined strategies—followed by an in‑depth discussion of shuffle manager evolution and key configuration parameters for fine‑tuning.

Data SkewPerformance TuningShuffle Optimization
0 likes · 35 min read
Advanced Spark Performance Optimization: Data Skew and Shuffle Tuning