Databases 18 min read

Why MySQL Indexes Matter: From Pages to B+ Trees Explained

This article walks through MySQL's InnoDB storage engine, explaining how pages, locality, skip‑list concepts and B+‑tree structures work together to make indexes efficient, and shows how clustered, non‑clustered and covering indexes affect query performance and I/O behavior.

Su San Talks Tech

May 12, 2022

Why MySQL Indexes Matter: From Pages to B+ Trees Explained

Indexes are a fundamental skill for every engineer; understanding their principles is crucial for writing high‑quality SQL. This article builds the concept of indexes from scratch, using MySQL InnoDB pages and B+‑tree structures as the core examples.

From Practical Needs

Assume a simple user table:

CREATE TABLE `user` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `name` int(11) DEFAULT NULL COMMENT '姓名',
  `age` tinyint(3) unsigned DEFAULT NULL COMMENT '年龄',
  `height` int(11) DEFAULT NULL COMMENT '身高',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='用户表';

Typical queries on this table include:

1. SELECT * FROM user WHERE id = xxx;
2. SELECT * FROM user ORDER BY id ASC/DESC;
3. SELECT * FROM user WHERE age = xxx;
4. SELECT age FROM user WHERE age = xxx;
5. SELECT age FROM user ORDER BY age ASC/DESC;

We first insert some data (note that id is omitted; InnoDB auto‑generates an incrementing primary key):

INSERT INTO user (`name`,`age`,`height`) VALUES ('张三',20,170);
INSERT INTO user (`name`,`age`,`height`) VALUES ('李四',21,171);
INSERT INTO user (`name`,`age`,`height`) VALUES ('王五',22,172);
INSERT INTO user (`name`,`age`,`height`) VALUES ('赵六',23,173);
INSERT INTO user (`name`,`age`,`height`) VALUES ('钱七',24,174);

InnoDB assigns sequential id values, linking records in a singly linked list ordered by id. To find id = 3, the engine reads records sequentially, causing three I/O operations. This highlights the I/O cost of linear scans.

Applying the principle of locality, the engine reads a whole page (default 16 KB) at once, typically containing about 100 rows. This reduces the number of I/O operations dramatically—for example, locating id = 100 can be done with a single I/O instead of 100.

Page Directory and Skip‑List Idea

Within a page, a small directory (or skip‑list) allows faster location of a target record. By grouping every four rows into a slot and storing the maximum id of each slot, the engine can first locate the appropriate slot via binary search, then scan only a few rows inside the slot.

The Birth of the B+ Tree

When many pages exist, a multi‑level index is needed. Pages themselves are organized into a B+‑tree: the top level is the root node, intermediate levels are internal nodes (directory pages), and the bottom level stores full records (data pages). Each level adds one more I/O for a lookup.

Capacity grows exponentially with tree height: a 3‑level B+‑tree can hold up to 100 000 000 rows (assuming 100 rows per data page and 1 000 entries per directory page).

Clustered vs. Non‑Clustered Indexes

Primary‑key indexes store the full row in leaf nodes and are therefore clustered indexes. Secondary (non‑clustered) indexes store only the indexed column(s) plus the primary key. To retrieve full rows from a secondary index, MySQL performs a "back‑table" lookup using the primary key, which can cause random I/O.

Covering indexes avoid back‑table lookups by ensuring the query requests only columns present in the index (e.g., SELECT age FROM user WHERE age = xxx).

Disk Prefetch and OS Page Management

Operating systems manage memory in pages (typically 4 KB). Disks read/write data in sectors, grouped into tracks. I/O consists of seek time, rotational latency, and data transfer, with the first two dominating latency. Sequential reads (adjacent sectors) are much faster than random reads.

InnoDB pages (16 KB) align with the OS page size (4 KB), allowing a single I/O to fetch an entire InnoDB page via disk prefetch, which is why reading one page counts as one I/O operation.

Summary

After reading this article you should understand why indexes exist, how pages and disk prefetch improve performance, and the evolution from simple page scans to B+‑tree indexes. For deeper insight into MySQL's internal page structure, refer to the book "How MySQL Works".

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.