Why Hash Join Beats Nested Loop Join and When It Fails
This article explains why hash joins usually outperform nested‑loop joins, how to force hash joins in SQL, the data‑type restrictions that prevent hash joins, and practical tips for optimizing join performance in TD and Oracle compatibility modes.
1. Hash join usually outperforms nested‑loop join
Nested‑loop joins have O(N²) complexity, while hash joins run in O(N), so hash joins are generally preferred.
During SQL tuning you can force a hash join in two ways:
Disable nested‑loop joins at the session level: set enable_nestloop to off; Use a hint in the query: /*+ hashjoin(a b) */ to make tables a and b use a hash join.
CREATE DATABASE test_td WITH DBCOMPATIBILITY='td';
create table dim_day(day_code char(8));
create table dwr_rpo as select current_date - 1 as day_code; -- returns date type
explain select *
from dwr_rpo a
left join dim_day c on c.day_code = a.day_code;
-- Sample execution plan (simplified)
1 | Streaming (type: GATHER) | 1310148 rows
2 | Nested Loop Left Join (3, 4) | 1310148 rows, 1 MB memory
3 | Seq Scan on dwr_rpo a | 1310148 rows, 1 MB memory
4 | Materialize | 109575 rows, 16 MB memory
5 | Streaming (type: BROADCAST) | 109575 rows, 2 MB memory
6 | Seq Scan on dim_day c | 36525 rows, 1 MB memoryEven with these settings, the query may still not use a hash join because the data types on both sides must support hash comparison.
Why hash join sometimes cannot be used
Different data types compute hash functions differently; incompatible types cannot be hashed together.
Performance gap illustration
Nested‑loop join complexity: 131 million × 10 million = 1.31 trillion operations.
Hash join complexity: roughly 131 million operations.
The difference explains why a hash join can finish in seconds while a nested‑loop join may take hours.
Why type conversion may still prevent hash join
Even if types appear similar, differences in precision, format, or time‑zone handling make them non‑compatible for hash comparison.
Data types that do not support hash joins
select oprname, oprkind, oprcanhash,
(select typname from pg_type where oid=oprleft) as oprleft,
(select typname from pg_type where oid=oprright) as oprright
from pg_operator
where oprname='=' and oprcanhash='f';
-- Sample result (partial)
oprname | oprkind | oprcanhash | oprleft | oprright
---------------------------------------------------
= | b | f | xid | int8
= | b | f | xid32 | int4
= | b | f | date | timestamp
= | b | f | date | timestamptz
= | b | f | timestamp | date
= | b | f | timestamptz | date
= | b | f | timestamp | timestamptz
= | b | f | timestamptz | timestampIn practice, joins between timestamp, timestamptz, and date cannot use hash joins; other types are rarely encountered.
Development tip: Keep the data types on both sides of a join as consistent and compatible as possible.
Why Oracle compatibility mode works but TD compatibility does not
In TD compatibility mode, current_date is of type date, while in Oracle compatibility mode it is of type timestamp, leading to the incompatibility described above.
(Copyright belongs to the original author, please delete if infringed.)
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
