Databases 11 min read

Why Does Alibaba's Java Manual Ban Joins Over Three Tables? A Deep Performance Dive

This article investigates Alibaba's recommendation against joining more than three MySQL tables by designing experiments with synthetic data, measuring query performance on MySQL and Oracle, analyzing index impact, and providing full DDL and data‑generation scripts to explain the rule's practical limits.

ITPUB
ITPUB
ITPUB
Why Does Alibaba's Java Manual Ban Joins Over Three Tables? A Deep Performance Dive

Problem Statement

Alibaba's Java Development Manual states that joining more than three tables in MySQL should be avoided. The article investigates the reason behind this rule.

Analysis

The author questions the rule and designs an experiment to test it by creating four tables (student, teacher, course, sc) and a query that finds the student with the highest score for courses taught by a specific teacher.

Experimental Environment

VMware 10, CentOS 7.4, MySQL 5.7.22 with 4.5 GB RAM, 4 CPU cores, 50 GB SSD. MySQL buffer pool is configured to 2 GB.

Experiment Procedure

Data‑generation scripts create varying data volumes: 10 million, 50 million, up to 100 million rows in the join tables. The original query is split into three simpler statements to isolate each step. Indexes are added or omitted to compare performance.

Results on MySQL

MySQL test results
MySQL test results

Observations: without indexes the join is very slow; with simple queries the execution time remains acceptable up to about 150 million rows; beyond that MySQL fails to return results for the four‑table join.

Conclusion on MySQL

The “no join over three tables” rule originates from MySQL’s inability to handle very large multi‑table joins efficiently. For moderate data sizes the rule is unnecessary, but for massive datasets the recommendation helps avoid performance bottlenecks.

Oracle Comparison

Running the same query on Oracle shows a 26‑second execution even without indexes, demonstrating Oracle’s stronger join optimizer.

Data Generation Scripts (MySQL)

use stu;
DROP TABLE IF EXISTS student;
CREATE TABLE student (
  s_id INT(11) NOT NULL AUTO_INCREMENT,
  sno INT(11),
  sname VARCHAR(50),
  sage INT(11),
  ssex VARCHAR(8),
  father_id INT(11),
  mather_id INT(11),
  note VARCHAR(500),
  PRIMARY KEY (s_id),
  UNIQUE KEY uk_sno (sno)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

-- similar DDL for course, sc, teacher omitted for brevity

DELIMITER $$
CREATE FUNCTION insert_student_data() RETURNS INT DETERMINISTIC
BEGIN
  DECLARE i INT;
  SET i = 1;
  WHILE i <= 50000000 DO
    INSERT INTO student VALUES (i,i,CONCAT('name',i),i,'f',i,i,CONCAT('note',i));
    SET i = i + 1;
  END WHILE;
  RETURN 1;
END$$
DELIMITER ;

CALL insert_student_data();

Data Generation Scripts (Oracle)

CREATE TABLESPACE scott_data DATAFILE '/home/oracle/oracle_space/sitpay1/scott_data.dbf' SIZE 1024M AUTOEXTEND ON;
CREATE TABLESPACE scott_index DATAFILE '/home/oracle/oracle_space/sitpay1/scott_index.dbf' SIZE 64M AUTOEXTEND ON;
CREATE TABLE student (
  s_id NUMBER(11),
  sno NUMBER(11),
  sname VARCHAR2(50),
  sage NUMBER(11),
  ssex VARCHAR2(8),
  father_id NUMBER(11),
  mather_id NUMBER(11),
  note VARCHAR2(500)
) NOPARTITION;

CREATE OR REPLACE PROCEDURE insert_student_data IS
  q NUMBER(11) := 0;
BEGIN
  FOR i IN 1..50 LOOP
    INSERT INTO student SELECT
      ROW_NUMBER() OVER (ORDER BY NULL)+q AS s_id,
      ROW_NUMBER() OVER (ORDER BY NULL)+q AS sno,
      'student' || (ROW_NUMBER() OVER (ORDER BY NULL)+q) AS sname,
      FLOOR(DBMS_RANDOM.VALUE(1,100)) AS sage,
      'f' AS ssex,
      ROW_NUMBER() OVER (ORDER BY NULL)+q AS father_id,
      ROW_NUMBER() OVER (ORDER BY NULL)+q AS mather_id,
      'note' || (ROW_NUMBER() OVER (ORDER BY NULL)+q) AS note
    FROM dual CONNECT BY LEVEL <= 1000000;
    q := q + 1000000;
    COMMIT;
  END LOOP;
END;
/
CALL insert_student_data();

Key Takeaways

MySQL struggles with large four‑table joins without proper indexing.

Oracle handles the same workload more gracefully.

When data volume is moderate, multi‑table joins are feasible; otherwise, consider moving logic to the application layer.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLData GenerationmysqlJOINOracle
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.