Why Alibaba’s Java Handbook Bans Joins Over Three Tables – An Empirical Test
This article investigates Alibaba's recommendation against joining more than three tables by designing a MySQL experiment with large synthetic datasets, analyzing performance results, comparing with Oracle, and providing full SQL and data‑generation scripts to illustrate the underlying reasons.
1. Problem Statement
Alibaba's Java Development Manual advises not to join more than three tables; this article investigates the reason by designing SQL experiments.
2. Experiment Environment
VMware 10 + CentOS 7.4 + MySQL 5.7.22, 4.5 GB RAM, 4‑core CPU, 50 GB SSD.
3. Data Generation
Student table (4 GB memory, 4 cores, 50 GB disk).
MySQL configured with 2 GB buffer pool, SSD storage.
4. Experiment Procedure
Four tables are used: student, teacher, course, and sc (student‑course relationship). The goal is to query the student with the highest score for courses taught by teacher “tname553”. The original SQL is split into three simpler statements for clarity.
Data‑generation scripts create up to 100 million records for each table, e.g., 10 million student records, 5 million teachers, 1 000 courses, and corresponding relationship rows.
5. Test Results
Key observations from the test tables:
Step 3.1 lacks an index on the join key, causing very slow queries – highlighting the need for indexed join columns.
Steps 6.1‑6.3 (simple SQL) remain tolerable even with data volumes exceeding 100 million rows, showing MySQL can handle large scans but with noticeable load.
Step 5.1 (four‑table join) fails on the author’s MySQL instance when data reaches 150 million rows, despite proper indexes and execution plans.
Comparing steps 1.1 and 5.1 reveals a performance “waterline” around 15 million rows for four‑table joins on the test machine.
Step 5.1 versus 6.1‑6.3 demonstrates that multi‑table joins are more demanding for MySQL.
The “no join over three tables” rule is specific to MySQL; with the same hardware, Oracle can process 150 million‑row joins in 26 seconds, indicating MySQL’s relative weakness.
6. Conclusion
The rule “no join over three tables” originates from MySQL’s inability to efficiently process very large multi‑table joins; for smaller datasets the restriction is unnecessary, but in high‑concurrency, large‑scale systems it is safer to move complex logic to the application layer.
7. Oracle Comparison
Oracle executes the same four‑table join on 150 million rows within 26 seconds, demonstrating superior handling of large joins compared to MySQL.
8. Full SQL and Data‑Generation Scripts
use stu;<br>drop table if exists student;<br>create table student (<br> s_id int(11) not null auto_increment,<br> sno int(11),<br> sname varchar(50),<br> sage int(11),<br> ssex varchar(8),<br> father_id int(11),<br> mather_id int(11),<br> note varchar(500),<br> primary key (s_id),<br> unique key uk_sno (sno)<br>) engine=innodb default charset=utf8mb4;<br>... (remaining DDL and stored procedures for student, course, sc, teacher as in the source) ...9. Illustrative Images
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
