Tagged articles
4 articles
Page 1 of 1
dbaplus Community
dbaplus Community
Mar 23, 2020 · Big Data

How to Detect and Resolve Data Skew in Spark and Hadoop

This article explains what data skew is in distributed big‑data systems like Spark and Hadoop, why it hurts performance, how to spot it using the Web UI or key statistics, and presents eight practical mitigation techniques ranging from filtering and shuffle parallelism to custom partitioners and broadcast joins.

Broadcast JoinData SkewHadoop
0 likes · 19 min read
How to Detect and Resolve Data Skew in Spark and Hadoop
dbaplus Community
dbaplus Community
Aug 21, 2017 · Big Data

How to Tackle Spark Data Skew: Practical Solutions and Real‑World Examples

This article explains what Spark data skew is, why it hurts performance, and presents six practical mitigation techniques—including adjusting parallelism, custom partitioners, map‑side joins, and adding random prefixes—backed by detailed experiments, code snippets, and performance comparisons.

Data SkewMap-side JoinPartitioner
0 likes · 18 min read
How to Tackle Spark Data Skew: Practical Solutions and Real‑World Examples