Tagged articles

MapJoin

4 articles · Page 1 of 1

Feb 22, 2024 · Big Data

Tackling Data Skew in Large-Scale SQL Joins with MapJoin, DistMapJoin & SkewJoin

This article explores practical techniques for mitigating data skew in massive SQL join operations, detailing MapJoin, handling special/empty values, hotspot dispersion, SkewJoin, and the novel DistMapJoin approach, complete with code snippets and performance results from Alibaba's payment data pipeline.

DistMapJoinMapJoinSQL

0 likes · 15 min read

Tackling Data Skew in Large-Scale SQL Joins with MapJoin, DistMapJoin & SkewJoin

dbaplus Community

Apr 2, 2023 · Big Data

Unlock Faster ODPS SQL: Proven UNION, COUNT DISTINCT, and Join Optimizations

This article walks through common ODPS SQL scenarios—union, count distinct, large‑table joins, mapjoin, and predicate placement—explains why naïve implementations can be inefficient, shows how to read and interpret execution plans, and provides concrete rewritten queries that dramatically improve performance and resource usage.

Big DataCOUNT DISTINCTMapJoin

0 likes · 17 min read

Unlock Faster ODPS SQL: Proven UNION, COUNT DISTINCT, and Join Optimizations

ITPUB

Mar 25, 2023 · Big Data

Mastering Efficient SQL in ODPS: Union, Count‑Distinct, and Join Optimizations

This article walks through common SQL development scenarios on ODPS, examining why naïve UNION and COUNT DISTINCT can be slow, how to rewrite queries with GROUP BY, UNION ALL, JSON aggregation, and map‑join techniques, and shows the resulting execution‑plan improvements with concrete code and performance numbers.

Big DataCountDistinctMapJoin

0 likes · 17 min read

Mastering Efficient SQL in ODPS: Union, Count‑Distinct, and Join Optimizations

Big Data Technology Architecture

Mar 19, 2020 · Big Data

Handling Data Skew in Hive: Join, Group By, and COUNT(DISTINCT) Optimizations

Data skew in Hive MapReduce jobs, caused by uneven key distribution during joins, group‑by, or COUNT(DISTINCT) operations, can severely slow tasks, and the article explains common scenarios and practical solutions such as using MapJoin, enabling map‑side aggregation, load‑balancing, and rewriting queries to mitigate skew.

Data SkewHiveMapJoin

0 likes · 7 min read

Handling Data Skew in Hive: Join, Group By, and COUNT(DISTINCT) Optimizations