Databases 16 min read

Understanding MySQL Indexes, Pages, and B+ Tree Structure

This article explains the fundamentals of MySQL indexing, including the structure of InnoDB pages, the evolution from simple page reads to B+ tree indexes, and how clustered and non‑clustered indexes affect query performance and I/O efficiency.

Full-Stack Internet Architecture
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Understanding MySQL Indexes, Pages, and B+ Tree Structure

Indexes are a core skill for engineers, and understanding their principles is essential for writing high‑quality SQL. The article starts with a simple CREATE TABLE `user` (...) definition using the InnoDB storage engine and lists typical queries such as SELECT * FROM user WHERE id = xxx and ordering by id or age.

Because InnoDB assigns an auto‑incrementing id to each row, records are linked in insertion order. A naïve lookup of id = 3 would read three rows sequentially, causing three I/O operations. To reduce I/O, the article introduces the principle of locality of reference: reading a block of contiguous rows (a page) at once.

A page in InnoDB is typically 16 KB and holds about 100 rows. The page acts as the smallest storage unit, and a page directory (or slot) records the maximum id in each group of rows. This allows faster location of a target row within a page.

When many pages exist, a second‑level directory is needed. The article describes how pages themselves are organized into a hierarchy of directory pages, forming a B+ tree. The root node points to leaf pages that store the actual rows. Searching for id = 55 traverses the root, then an intermediate directory page, and finally the leaf page, resulting in three I/O operations regardless of the total number of rows.

The article then distinguishes clustered (primary key) indexes, where leaf nodes store the full row, from non‑clustered (secondary) indexes, where leaf nodes store only the indexed column and the primary key. Queries that need columns not present in a secondary index must perform a “back‑lookup” (回表) to the clustered index, which can cause many random I/O operations.

To avoid back‑lookups, the concept of covering indexes is introduced: if a query requests only columns that are part of the index (e.g., SELECT age FROM user WHERE age = xxx), the engine can satisfy the query directly from the secondary index without accessing the data page.

Finally, the article explains disk I/O fundamentals—seek time, rotational latency, and data transfer—and how sequential I/O (reading an entire page) is much faster than random I/O. It clarifies that reading a 16 KB InnoDB page is a single I/O operation because the page’s sectors are contiguous, and modern operating systems can pre‑fetch multiple pages in one request.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

InnoDBmysqlindexDatabase PerformanceB+TreePage I/O
Full-Stack Internet Architecture
Written by

Full-Stack Internet Architecture

Introducing full-stack Internet architecture technologies centered on Java

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.