Evaluating the ‘No Join Over Three Tables’ Rule from Alibaba Java Development Manual with MySQL and Oracle Experiments
This article investigates why the Alibaba Java Development Manual advises against joining more than three tables by designing and executing large‑scale MySQL and Oracle experiments, analyzing query performance, indexing effects, and data‑generation scripts to determine the practical limits of multi‑table joins.
The author questions the Alibaba Java Development Manual’s rule that joins involving more than three tables should be avoided, and designs a series of experiments to verify this claim using MySQL 5.7.22 on a VMware‑CentOS 7.4 environment.
Four tables (student, teacher, course, and sc) are created, and a query is written to find the student with the highest score for courses taught by a specific teacher. The original SQL query and its decomposition into three simpler statements are shown:
select Student.Sname,course.cname,score
from Student,SC,Course ,Teacher
where Student.s_id=SC.s_id and SC.c_id=Course.c_id and sc.t_id=teacher.t_id
and Teacher.Tname='tname553'
and SC.score=(select max(score) from SC where sc.t_id=teacher.t_Id);Data generation scripts are provided to create massive datasets (up to 50 million students, 10 million teachers, 1 billion sc records) in MySQL, with and without indexes, to test query performance at different scales. Sample MySQL data‑generation functions are included:
create function insert_student_data() returns int deterministic
begin
declare i int;
set i=1;
while i<50000000 do
insert into student values(i,i,concat('name',i),i,
case when floor(rand()*10)%2=0 then 'f' else 'm' end,
floor(rand()*100000),floor(rand()*1000000),concat('note',i));
set i=i+1;
end while;
return 1;
end;Similar scripts generate data for the course, sc, and teacher tables, and indexes are added on join keys (s_id, t_id, c_id). The experiments compare query times for 1 million, 10 million, and 100 million rows, showing that lack of indexes dramatically slows multi‑table joins, while indexed joins remain acceptable up to tens of millions of rows.
Oracle experiments are also performed, creating equivalent tablespaces, users, and data‑generation procedures. The Oracle results demonstrate that, even without indexes, Oracle can complete the same four‑table join in about 26 seconds for large datasets, highlighting MySQL’s relative weakness in handling complex joins.
From the results, the author concludes that the “no join over three tables” rule is primarily a MySQL‑specific performance guideline; with proper indexing and reasonable data volumes, joins of more than three tables are feasible, and complex logic can be safely placed in the application layer when needed.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.