Why Alibaba's Java Development Manual Prohibits Joins Over Three Tables – MySQL and Oracle Performance Experiments
The article investigates the Alibaba Java Development Manual's rule against joining more than three tables by designing and executing extensive MySQL and Oracle experiments, generating massive test data, measuring query performance, and concluding that the restriction stems from join scalability limits on large datasets.
The author questions the Alibaba Java Development Manual's recommendation to avoid joins involving more than three tables and designs a series of experiments to validate this claim.
Problem Statement: Is the rule justified, and under what conditions does multi‑table joining become problematic?
Experiment Environment: VMware 10 + CentOS 7.4 with MySQL 5.7.22 (4 GB RAM, 4 CPU, SSD) and later Oracle on comparable hardware.
Test Data: Four tables – student , teacher , course , and sc (student‑course relationship) – are populated with up to 500 million rows using custom data‑generation functions.
Key Query: Retrieve the student with the highest score for courses taught by teacher "tname553":
select Student.Sname, course.cname, score
from Student, SC, Course, Teacher
where Student.s_id = SC.s_id
and SC.c_id = Course.c_id
and SC.t_id = Teacher.t_id
and Teacher.Tname = 'tname553'
and SC.score = (select max(score) from SC where sc.t_id = teacher.t_id);The query is broken down into three simpler statements to isolate the max‑score subquery, the join filtering, and the final projection.
Data Generation Scripts (MySQL):
use stu;
drop table if exists student;
create table student (
s_id int(11) not null auto_increment,
sno int(11),
sname varchar(50),
sage int(11),
ssex varchar(8),
father_id int(11),
mather_id int(11),
note varchar(500),
primary key (s_id),
unique key uk_sno (sno)
) engine=innodb default charset=utf8mb4;
... (additional DDL and insert_student_data function) ...Similar DDL and data‑generation functions are provided for course , sc , and teacher tables, each inserting millions of rows in batches.
Oracle Setup: Tablespaces, users, and procedures are created to mirror the MySQL schema, with PL/SQL loops generating comparable volumes of test data.
create tablespace scott_data datafile '/home/oracle/oracle_space/sitpay1/scott_data.dbf' size 1024m autoextend on;
create user scott identified by tiger default tablespace scott_data temporary tablespace scott_temp;
... (procedure insert_student_data, insert_course_data, insert_sc_data, insert_teacher_data) ...Results: The author presents screenshots of query execution times for varying data sizes. Findings include:
Missing indexes on join keys cause severe slowdown, confirming the need for indexed join columns.
With simple queries and data up to ~100 million rows, MySQL can still return results, though performance degrades.
Beyond ~150 million rows, MySQL fails to complete the four‑table join, while Oracle completes the same query in ~26 seconds, highlighting MySQL's weaker join handling.
Even with indexed joins, large‑scale multi‑table queries remain resource‑intensive.
Conclusion: The "no join over three tables" rule is primarily a MySQL‑specific guideline for very large datasets; Oracle handles such joins more efficiently. For high‑concurrency systems, developers should keep SQL simple and push complex logic to the application layer.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.