Databases 6 min read

Why Alibaba’s Java Handbook Limits Joins and How to Write Efficient SQL

The article explains why the Alibaba Java Development Manual restricts joins to three tables, discusses MySQL’s join algorithm limitations, and offers practical alternatives such as query decomposition, denormalization, and using IN or hash joins to improve performance.

Programmer DD

Nov 21, 2019

Why Alibaba’s Java Handbook Limits Joins and How to Write Efficient SQL

The author encountered a question on Zhihu about the Alibaba Java Development Manual’s rule that joins involving more than three tables are prohibited, and wonders why this restriction exists and how to write SQL under it.

To address such issues, the author suggests consulting official documentation or the book High Performance MySQL (chapter 6.3), which advises evaluating whether a complex query should be split into simpler ones rather than offloading all work to the database.

Reasons for the Join Limitation

The limitation stems from MySQL’s weak optimizer and execution engine. The optimizer often cannot produce an efficient query plan for multi‑table joins, and the executor only supports three types of nested‑loop joins:

Nested loop join : compares each row from the two tables (O(n²) complexity).

Block nested loop join : reads many rows from each table before comparing, still O(n²) but with less overhead.

Index nested loop join : reads a row from the first table and looks up matching rows in the second table’s B+‑tree index, roughly O(n log n), which is much faster if the join columns are indexed.

If a hash join were available, the restriction could be lifted: the smaller table builds a hash table and the larger table probes it, achieving O(n) complexity, though it consumes more memory.

Practical Workarounds

One common approach is to denormalize data, creating a wider table that contains columns from multiple related tables, thus eliminating the need for joins. For example, instead of three normalized tables student(id, name), class(id, description), and student_class(student_id, class_id), one could use a single table student_class_full(student_id, class_id, name, description). This introduces redundancy but can dramatically improve query performance.

The author also notes that the IN keyword can be used to replace some joins, as illustrated in the referenced book.

In summary, the join limit is a compromise dictated by MySQL’s performance characteristics; understanding the underlying join algorithms and applying space‑for‑time trade‑offs can help write efficient SQL.

Finally, the author strongly recommends reading High Performance MySQL as a reference guide.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

MySQL Database Design SQL optimization Denormalization join strategies

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.