Databases 11 min read

Why the 3‑Table JOIN Ban Exists and How to Overcome It

This article explores the origins of the "no more than three‑table JOIN" rule, explains MySQL's join algorithm limitations, shows real‑world performance pain points in distributed architectures, and presents practical solutions such as step‑wise queries, denormalization, and materialized views to safely break the restriction.

Su San Talks Tech

Jul 12, 2025

Why the 3‑Table JOIN Ban Exists and How to Overcome It

Introduction

In 2017 the Alibaba Java Development Manual prohibited JOIN operations involving more than three tables, a rule that many companies still follow today. This article examines the architectural reasons behind the rule, the execution principles of MySQL joins, and practical ways to handle multi‑table queries.

1. The Performance Nightmare of Multi‑Table JOINs

1.1 Real‑World Case: A Painful Incident

A typical e‑commerce order query that joins four tables (orders, users, products, warehouses) suffers from:

Single query latency > 800 ms

Database CPU spikes to 90% during peak

Frequent slow‑query alerts

The cause is the MySQL optimizer choosing a wrong driving table, leading to full‑table scans on millions of rows.

2. MySQL's JOIN Limitations

2.1 Inherent Defects of the Execution Engine

MySQL only supports three join algorithms:

Simple Nested‑Loop Join : brute‑force double loop, O(m·n)

Block Nested‑Loop Join : loads batches into join_buffer, still O(m·n)

Index Nested‑Loop Join : relies on indexes, O(m·log n)

Critical missing features:

No Hash Join (before 8.0.18)

No Sort‑Merge Join

Optimizer easily picks the wrong driving table when joining many tables.

2.2 Optimizer Limitations

As the number of tables grows, possible join orders increase factorially (4 tables → 24 orders, 5 tables → 120 orders). MySQL’s optimizer uses a greedy algorithm instead of exhaustive search, often producing sub‑optimal plans, especially when statistics are inaccurate.

3. Distributed Architecture Challenges

3.1 JOIN Pain Points After Sharding

In sharded environments (e.g., Alibaba’s split‑by‑database‑and‑table strategy), multi‑table JOINs encounter three major issues:

Cross‑node data association must be handled in the application layer.

Network transfer becomes a performance bottleneck.

Transactional consistency is hard to guarantee.

3.2 Performance Comparison After Sharding

Test results (order table split into 16 databases, each with 64 tables):

Single‑shard query : 25 ms response, 5% CPU, 5 KB traffic.

Cross‑shard JOIN : 1200 ms response, 85% CPU, 120 MB traffic.

In‑memory merge : 800 ms response, 70% CPU, 80 MB traffic.

4. Breaking the Ban: Alibaba‑Recommended Solutions

4.1 Solution 1 – Step‑wise Query + In‑Memory Computation

// 1. Query basic order info
List<Order> orders = orderDao.query("SELECT * FROM orders WHERE status=1");

// 2. Extract unique user IDs
Set<Long> userIds = orders.stream().map(Order::getUserId).collect(Collectors.toSet());

// 3. Batch query user info
Map<Long, User> userMap = userDao.queryByIds(userIds).stream()
    .collect(Collectors.toMap(User::getId, Function.identity()));

// 4. Assemble data in memory
orders.forEach(order -> {
    order.setUserName(userMap.get(order.getUserId()).getName());
});

Advantages: avoids complex JOINs, leverages caching, simplifies pagination.

4.2 Solution 2 – Denormalization

Before optimization:

SELECT o.*, p.name FROM orders o JOIN products p ON o.product_id = p.id; -- needs JOIN

After optimization, store the product name directly in the order table:

CREATE TABLE orders (
    id BIGINT,
    product_id BIGINT,
    product_name VARCHAR(100) -- redundant product name
);

Guidelines for denormalization:

Redundant fields should be high‑frequency query columns.

Redundant fields should change rarely.

Low QPS business scenarios are suitable for redundancy.

4.3 Solution 3 – Asynchronous Materialized Views

-- Create pre‑computed view
CREATE MATERIALIZED VIEW order_detail_view AS
SELECT o.*, u.name, u.phone, p.product_name
FROM orders o
JOIN users u ON o.user_id = u.user_id
JOIN products p ON o.product_id = p.product_id
WHERE o.status = 1;

-- Query the view directly
SELECT * FROM order_detail_view WHERE user_id = 1001;

Applicable scenarios: reports with low real‑time requirements and heavy aggregation queries.

5. When Can the Ban Be Lifted?

5.1 Scenario 1 – NewSQL Databases (e.g., TiDB)

TiDB provides a distributed Hash Join implementation with multi‑threaded hash table building, intelligent small‑table selection, and memory‑plus‑disk spill control.

5.2 Scenario 2 – OLAP Engines (e.g., ClickHouse)

SELECT a.*, b.extra_data
FROM big_table a
JOIN small_table b ON a.id = b.id
SETTINGS join_algorithm = 'hash', max_bytes_in_join = '10G';

Suitable for large‑scale low‑latency analytics where the fact table is much larger than dimension tables.

6. Golden Practice Rules

6.1 Four Principles for JOIN Optimization

Drive the large table with a small table.

Ensure the driven table has appropriate indexes (unless the dimension table < 100 rows).

Avoid JOINs involving more than three tables; consider business decomposition.

Never perform cross‑DB‑instance JOINs.

6.2 Boundaries of the Rule

OLTP high‑frequency transactions – prohibited (response time sensitive).

OLAP analysis systems – allowed (throughput prioritized).

Sharded architectures – prohibited (cross‑node JOIN performance poor).

Small tables (<100 rows) – allowed (negligible impact).

Conclusion

The "no more than three‑table JOIN" guideline reflects a shift in architectural thinking: databases should focus on storage and transaction guarantees, while business logic and data aggregation move to the application layer or specialized engines. Following the outlined principles and using modern solutions such as step‑wise queries, denormalization, or materialized views enables safe and performant handling of complex data relationships.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

mysql Database Performance distributed databases join optimization Denormalization

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.