Tag

MapJoin

0 views collected around this technical thread.

Big Data Technology Architecture
Big Data Technology Architecture
Mar 19, 2020 · Big Data

Handling Data Skew in Hive: Join, Group By, and COUNT(DISTINCT) Optimizations

Data skew in Hive MapReduce jobs, caused by uneven key distribution during joins, group‑by, or COUNT(DISTINCT) operations, can severely slow tasks, and the article explains common scenarios and practical solutions such as using MapJoin, enabling map‑side aggregation, load‑balancing, and rewriting queries to mitigate skew.

Big DataHiveMapJoin
0 likes · 7 min read
Handling Data Skew in Hive: Join, Group By, and COUNT(DISTINCT) Optimizations