Why Alibaba Bans Joins Over Three Tables: Real‑World MySQL & Oracle Benchmarks
This article investigates Alibaba's rule against joining more than three tables by designing MySQL and Oracle experiments that generate massive student‑teacher‑course data, run multi‑table join queries, compare indexed versus non‑indexed scenarios, and reveal the performance limits that drive the guideline.
Problem Statement and Environment
Alibaba’s Java development handbook forbids joining more than three tables. The article questions this rule and evaluates its impact with data‑driven experiments.
Test environment: VMware 10, CentOS 7.4, MySQL 5.7.22 (4 GB RAM, 4 CPU, SSD). Four tables are created – student , teacher , course , and sc (student‑course relation).
SQL Query Under Test
select Student.Sname, course.cname, score
from Student, SC, Course, Teacher
where Student.s_id = SC.s_id
and SC.c_id = Course.c_id
and SC.t_id = Teacher.t_id
and Teacher.Tname = 'tname553'
and SC.score = (select max(score) from SC where sc.t_id = Teacher.t_Id);The query is broken into three simpler statements for analysis.
select max(score) from SC, Teacher where sc.t_id = teacher.t_Id and Teacher.Tname='tname553';
select sc.t_id, sc.s_id, score from SC, Teacher where sc.t_id = teacher.t_Id and score=590 and Teacher.Tname='tname553';
select Student.Sname, course.cname, score from Student, SC, course where Student.s_id = SC.s_id and sc.s_id in (20769800,48525000,26280200) and course.c_id = sc.c_id;Data Generation Scripts
Scripts create massive datasets:
1 × 10⁷ enrollment records (2 courses per student), 5 × 10⁶ students, 1 × 10⁶ teachers, 1000 courses (MySQL).
1 × 10⁸ enrollment records, 5 × 10⁷ students, 1 × 10⁷ teachers, 1000 courses (MySQL).
Similar schema on Oracle.
Data‑generation functions use loops and INSERT … SELECT … FROM dual with random values.
Test Results (MySQL)
Key observations:
When the join key lacks an index (step 3.1), the query becomes extremely slow, confirming that indexed join columns are essential.
For data sizes above 100 million rows, simple SQL (steps 6.1‑6.3) still finishes in acceptable time, showing MySQL can handle large joins but with noticeable overhead.
Four‑table joins fail on the author’s machine when data reaches ~150 million rows (step 5.1), indicating a practical limit for MySQL without further tuning.
MySQL’s performance degrades noticeably when the number of joined tables exceeds three, especially with large datasets.
Oracle Benchmark
Even without indexes, Oracle completes the same four‑table join in about 26 seconds, demonstrating superior join performance compared to MySQL.
Conclusion
The “no more than three tables” rule originates from MySQL’s limitations at very high data volumes; with smaller tables, joins of many tables work fine. For high‑concurrency, large‑scale systems, it is safer to keep SQL simple and push complex logic to the application layer.
Additional scripts for rapid data generation are provided at the end of the article.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
