Why Does Alibaba Recommend No More Than Three Table Joins? MySQL vs Oracle Performance Test
The article investigates the Alibaba Java development manual's rule that joins involving more than three tables are prohibited, by designing massive MySQL and Oracle experiments, generating billions of rows, measuring query times, analyzing indexing effects, and concluding that the rule is MySQL‑specific and driven by large‑scale performance limits.
Problem Statement
The Alibaba Java Development Manual states that joining more than three tables is prohibited; the article questions why this rule exists and sets out to verify it experimentally.
Analysis
The author treats MySQL as a black box, designs SQL queries to test the rule, and plans to observe performance differences when joining four tables versus simpler queries.
Experimental Environment
VMware 10 + CentOS 7.4 + MySQL 5.7.22, 4 CPU, 4.5 GB RAM, 50 GB SSD, MySQL buffer pool set to 2 GB.
Experiment Overview
Four tables are created: student, teacher, course, and sc (student‑course relationship). The target query finds the student with the highest score for courses taught by teacher tname553 . The original SQL uses an equal‑join of all four tables plus a sub‑query, which is then broken down into three simpler statements.
Data Generation
Massive data‑generation scripts are provided to populate the tables:
10 million enrollment records (each student selects 2 courses)
5 million students
1 million teachers (each teacher handles 5 students)
1 000 courses
MySQL functions insert_student_data(), insert_course_data(), insert_sc_data(), and insert_teacher_data() use loops and INSERT … SELECT to generate rows efficiently. Equivalent PL/SQL scripts for Oracle are also included.
CREATE TABLE student (
s_id INT NOT NULL AUTO_INCREMENT,
sno INT,
sname VARCHAR(50),
sage INT,
ssex VARCHAR(8),
father_id INT,
mather_id INT,
note VARCHAR(500),
PRIMARY KEY (s_id),
UNIQUE KEY uk_sno (sno)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; DELIMITER $$
CREATE FUNCTION insert_student_data() RETURNS INT DETERMINISTIC
BEGIN
DECLARE i INT DEFAULT 1;
WHILE i < 50000000 DO
INSERT INTO student VALUES (i,i,CONCAT('name',i),i,
IF(FLOOR(RAND()*10)%2=0,'f','m'),
FLOOR(RAND()*100000),
FLOOR(RAND()*1000000),
CONCAT('note',i));
SET i = i + 1;
END WHILE;
RETURN 1;
END$$
DELIMITER ;Results
Performance tables (shown as images) reveal several key findings:
Step 3.1 (no index on join keys) makes the query extremely slow, confirming that indexed join columns are essential.
When the complex four‑table join is simplified to three separate queries, MySQL can still handle data volumes up to about 10 million rows with acceptable response times.
At around 15 million rows, the four‑table join exceeds MySQL’s practical limit (the query fails or takes excessively long).
Oracle, on the same hardware, completes a similar four‑table join in roughly 26 seconds, demonstrating superior join performance for large data sets.
These observations are illustrated by screenshots of query timings and execution plans.
Conclusions
The “no more than three tables join” rule originates from MySQL’s performance characteristics on very large data sets; it is not a universal SQL limitation. With proper indexing and moderate data volumes, MySQL can join more than three tables efficiently. For massive workloads, moving complex join logic to the application layer or using a database with stronger join capabilities (e.g., Oracle) is advisable.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
