Understanding MySQL Index Structures: From Simple Tables to B+ Trees
This article explains how MySQL stores data using pages, page directories, and multi‑page structures, demonstrates why MySQL silently sorts rows on insert, and shows how these mechanisms combine into the B+‑tree index that optimizes query performance and supports clustering, non‑clustering, and composite indexes.
The author starts by creating a simple user table with columns id , age , height , weight and name , then inserts several rows and verifies the data with SELECT * FROM user; . Although no explicit ORDER BY clause is used, MySQL returns the rows ordered by id , prompting the question of when and why this implicit sorting occurs.
To answer this, the article introduces the concept of a database page —a 16 KB logical storage unit in InnoDB. By loading an entire page into memory, MySQL reduces disk I/O compared to row‑by‑row reads. The author illustrates how a single page can hold multiple rows and how the page’s internal linked‑list structure still requires sequential scans.
Next, a page directory is presented as an index within a page that stores the smallest key of each sub‑range, allowing the engine to jump directly to the relevant page instead of scanning every row. This mirrors a book’s table of contents and explains why sorting on insert is essential for the directory to work.
When data exceeds a single page, InnoDB creates additional pages linked together, forming a multi‑page structure. To avoid linear scans across many linked pages, a higher‑level directory page stores pointers to the first row of each page, effectively creating a two‑level index.
Combining page directories and directory pages yields the classic B+‑tree index used by MySQL. All leaf nodes contain the actual row data (or primary‑key pointers for secondary indexes), while internal nodes store only key values and child pointers, enabling logarithmic search, range scans, and stable three‑level depth even for tens of millions of rows.
The article then discusses clustered vs. non‑clustered indexes (InnoDB has a single clustered index on the primary key; secondary indexes are non‑clustered and require a “back‑table” lookup). It explains the left‑most prefix rule for composite indexes, showing how MySQL compares columns from left to right and why a query that omits the leading column cannot use the index.
Examples of queries that fully, partially, or do not use the composite index idx_obj(age, height, weight) illustrate the optimizer’s decisions, including cases where range predicates or missing leading columns force a full table scan.
Finally, the concept of a covering index is introduced: if a query can be satisfied entirely from the index (e.g., selecting only indexed columns), MySQL avoids the back‑table lookup, further improving performance. The article concludes by summarizing how sorting, pages, page directories, multi‑page structures, and B+‑trees work together to optimize MySQL query execution.
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.