Databases 14 min read

Understanding How Indexes Influence MySQL JOIN Queries and the Mechanics of Index and Block Nested-Loop Joins

This article explains how MySQL uses indexes during JOIN operations, compares indexed and non‑indexed join execution, introduces Index Nested‑Loop Join and Block Nested‑Loop Join algorithms, discusses the role of the join buffer, and provides practical guidance on choosing the optimal driving table for efficient query performance.

政采云技术

Nov 1, 2022

Understanding How Indexes Influence MySQL JOIN Queries and the Mechanics of Index and Block Nested-Loop Joins

Preface

Most of us have been using MySQL for a long time and writing various JOIN queries daily, but few understand exactly how JOINs are executed and what the most correct way to write them is; this article explores those details.

Impact of Indexes on JOIN Queries

Data Preparation

Assume two tables t1 and t2, each with a primary key id and an indexed column a; columns b have no index. Insert 100 rows into t1 and 1,000 rows into t2 for the experiment.

CREATE TABLE `t2` (
  `id` int NOT NULL,
  `a` int DEFAULT NULL,
  `b` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `t2_a_index` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

CREATE PROCEDURE **idata**()
BEGIN
  DECLARE i INT;
  SET i = 1;
  WHILE (i <= 1000) do
    INSERT INTO t2 VALUES (i,i,i);
    SET i = i + 1;
  END WHILE;
END;
CALL **idata**();
CREATE TABLE t1 LIKE t2;
INSERT INTO t1 (SELECT * FROM t2 WHERE id <= 100);

Indexed Query Process

We run SELECT * FROM t1 STRAIGHT_JOIN t2 ON (t1.a=t2.a); using STRAIGHT_JOIN to force the optimizer to keep t1 as the driving table and t2 as the driven table, making the execution plan easier to observe.

The EXPLAIN output shows that the query uses the index on t2.a. The execution proceeds as follows:

Read one row r from t1.

Extract the value of a from r and look it up in t2.

Retrieve matching rows from t2 and combine them with r to form part of the result set.

Repeat steps 1‑3 until all rows of t1 have been processed.

This is an Index Nested‑Loop Join. Because t1 contains 100 rows, the driver table scans 100 rows. For each driver row, t2 is searched via its index, scanning only one row each time, resulting in a total of 200 row scans.

Non‑Indexed Query Process

SELECT * FROM t1 STRAIGHT_JOIN t2 ON (t1.a = t2.b);

Since column b in t2 has no index, each driver row forces a full scan of t2. With 100 rows in t1 and 1,000 rows in t2, the query scans 100 × 1,000 = 100,000 rows, which would explode to 10 billion rows for tables of 100,000 rows each.

Understanding Block Nested‑Loop Join

Block Nested‑Loop Join Query Process

When the driven table lacks a usable index, MySQL loads the driver table into the join_buffer and then scans the driven table, comparing each row against the buffer.

Figure 3

The whole process scans both tables once (100 + 1,000 = 1,100 rows). The buffer holds the driver rows as an unordered array, so each of the 1,000 rows in t2 is compared against all 100 driver rows, resulting in 100,000 in‑memory comparisons, which are much faster than disk‑based scans.

Join_buffer

The memory used for this buffer is limited by the join_buffer_size system variable (default 256 KB).

SHOW VARIABLES LIKE '%join_buffer_size%';

If the data cannot fit into the buffer at once (e.g., only 80 rows fit), MySQL repeats the load‑scan‑clear cycle:

Read driver rows into join_buffer until it is full.

Scan the driven table, comparing each row with the buffer and returning matches.

Clear the buffer.

Continue loading the next chunk of driver rows and repeat.

This chunked processing explains the “Block” name; although the driver table is loaded in multiple chunks, the total number of equality checks remains the same (e.g., 80 + 20 = 100 driver rows × 1,000 driven rows = 100,000 checks).

How to Write Correct JOIN Queries

Choosing the Driving Table

When indexes are present, the driver table performs a full scan while the driven table uses index lookups. The overall complexity is roughly N + N·2·log₂M (where N is driver rows, M is driven rows). Because N dominates, the smaller table should be the driver.

When no indexes are available, the total number of in‑memory comparisons is fixed (e.g., 100,000), but the number of scanned rows is N + λ·N·M (with λ representing the fraction of driver rows loaded per chunk). Again, using the smaller table as driver minimizes scanned rows.

Conclusion: Regardless of index availability, the smaller table should be chosen as the driving table.

What Is a Small Table?

Using the earlier example, both tables have 100 rows after applying WHERE t2.id <= 50. Although neither table is indexed on b, the query

SELECT * FROM t2 STRAIGHT_JOIN t1 ON t1.b = t2.b WHERE t2.id <= 50;

loads only 50 rows from t2 into the buffer, making t2 the “small table”.

In another scenario where both tables have 100 rows, the amount of data that must be stored in the buffer differs: loading t1 only requires the b column, while loading t2 requires three columns ( id, a, b). Therefore, t1 should be the driver because it contributes less data to the buffer.

When deciding the driver, filter each table according to its own conditions, calculate the total data volume of the columns participating in the join, and choose the table with the smaller volume as the driver.

Recruitment Notice

The Zero technology team in Hangzhou, with over 500 engineers from companies like Alibaba, Huawei, NetEase and universities such as Zhejiang and USTC, works on cloud‑native, blockchain, AI, low‑code platforms, middleware, big data, and more. We contribute to many open‑source projects and are looking for passionate engineers to join us. Contact: zcy‑tc@cai‑inc.com

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL mysql database optimization JOIN index Block Nested-Loop Join

Written by

政采云技术

ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.