Why Does Alibaba’s Java Handbook Ban Joins Over Three Tables? A Deep MySQL & Oracle Performance Test
This article investigates the claim from Alibaba's Java Development Manual that joining more than three tables should be avoided, by setting up a MySQL 5.7 environment, generating massive synthetic data, executing multi‑table join queries, analyzing execution times, and comparing the results with Oracle, ultimately revealing the practical limits of MySQL joins on large data sets.
Problem Statement and Environment Setup
The Alibaba Java Development Manual states that joining more than three tables is prohibited. To verify this claim, I designed an experiment using MySQL 5.7.22 on a VMware‑10 + CentOS 7.4 VM (4 GB RAM, 4 CPU, 50 GB SSD).
Test Data Model
Four tables are created:
Student (student information)
Teacher (teacher information)
Course (course information)
SC (student‑course relationship, storing scores)
The target query retrieves the name and score of the highest‑scoring student for courses taught by teacher tname553:
select Student.Sname, course.cname, score
from Student, SC, Course, Teacher
where Student.s_id = SC.s_id
and SC.c_id = Course.c_id
and SC.t_id = Teacher.t_id
and Teacher.Tname = 'tname553'
and SC.score = (select max(score) from SC where sc.t_id = Teacher.t_Id);Query Decomposition
The above statement is split into three simpler queries:
select max(score) from SC, Teacher where sc.t_id = teacher.t_Id and Teacher.Tname = 'tname553';
select sc.t_id, sc.s_id, score from SC, Teacher where sc.t_id = teacher.t_Id and score = 590 and Teacher.Tname = 'tname553';
select Student.Sname, course.cname, score from Student, SC, course where Student.s_id = SC.s_id and sc.s_id in (20769800,48525000,26280200) and course.c_id = sc.c_id;Data Generation
Data is generated with scripts that create:
10 million SC records (each student selects 2 courses)
5 million students
1 million teachers (each teacher supervises 5 students)
1 000 courses
Indexes are added on the join keys of the SC table for comparison.
Test Results (MySQL)
Images below show execution times for different data volumes and index configurations.
Key Observations
When step 3.1 lacks an index on the join key, the query is extremely slow, confirming that indexed join fields are essential for multi‑table queries.
Steps 6.1‑6.3 (simple SQL) remain tolerable even with data exceeding 100 million rows, showing MySQL can still handle large volumes with optimized queries.
Step 5.1 (four‑table join) fails on the test machine once data reaches ~150 million rows, despite proper indexes and an optimized execution plan.
Comparing steps 1.1 and 5.1 reveals a performance “waterline” around 15 million rows for this hardware; beyond that, four‑table joins become impractical.
Overall, multi‑table joins impose a noticeable load on MySQL as data scales.
The “no join over three tables” rule originates from MySQL’s inability to handle very large joins efficiently; for massive datasets, business logic should be moved to the application layer.
Conclusion
MySQL struggles to execute joins involving more than three tables when the data volume is very large, which explains the guideline in Alibaba’s handbook. For smaller datasets, the restriction is unnecessary, but for high‑concurrency, large‑scale systems, keeping SQL simple and handling complex logic in code yields better performance.
Oracle Performance Comparison
Running the same queries on Oracle with the same hardware and data size shows that, even without indexes, Oracle completes the four‑table join in about 26 seconds, highlighting MySQL’s relative weakness in handling large joins.
Supplementary Scripts
Below are the essential DDL and data‑generation scripts used in the experiment.
use stu;
drop table if exists student;
create table student (
s_id int(11) not null auto_increment,
sno int(11),
sname varchar(50),
sage int(11),
ssex varchar(8),
father_id int(11),
mather_id int(11),
note varchar(500),
primary key (s_id),
unique key uk_sno (sno)
) engine=innodb default charset=utf8mb4;
... (additional DDL and PL/SQL procedures for student, course, sc, and teacher data generation) ...These scripts create the tables, generate synthetic data, and add the necessary indexes for the performance tests.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
