Understanding MySQL Indexes: From Binary Search Trees to B+ Trees and Their Use in InnoDB
This article explains the fundamentals of MySQL indexing, tracing the evolution from binary search trees and balanced AVL trees to B‑trees and B+‑trees, and details how InnoDB implements clustered and non‑clustered indexes, including search processes and performance considerations.
Introduction
Indexes are data structures that help locate rows quickly in large tables. In MySQL, three main index types exist: B+‑tree, hash, and full‑text. This article focuses on the B+‑tree index used by the InnoDB storage engine.
Binary Search Tree
A binary search tree (BST) stores a key and its row data in each node, with left children holding smaller keys and right children larger keys. Searching for a key (e.g., id=12) requires at most three node visits, compared with six linear scans.
Balanced Binary Tree (AVL Tree)
When a BST becomes unbalanced, its height grows and search performance degrades. An AVL tree maintains a height difference of at most one between left and right sub‑trees, ensuring more stable and faster lookups.
B‑Tree
Because disk I/O is orders of magnitude slower than memory access, a tree that stores many keys per node reduces the number of disk reads. A B‑tree node can hold multiple keys and pointers, resulting in a shallow, wide tree that minimizes I/O.
B+‑Tree
B+‑trees improve on B‑trees by storing only keys in internal nodes (no row data) and keeping all row data in the leaf nodes, which are linked together. This design allows range queries, ordered scans, and grouping operations to be performed efficiently.
Clustered vs. Non‑Clustered Indexes
In InnoDB, the primary key forms a clustered index: the table’s rows are stored in the leaf nodes of a B+‑tree keyed by the primary key. A non‑clustered index uses a secondary column as the key; its leaf nodes store the secondary key and the primary key, requiring an additional lookup (the “row lookup” or “back‑track”) to retrieve the full row.
Searching with a Clustered Index
To find rows where id ≥ 18 and id < 40, the engine starts at the root (often cached in memory), follows pointers to the appropriate leaf pages, and then scans sequentially through the linked leaf nodes until the upper bound is reached, retrieving all matching rows in order.
Searching with a Non‑Clustered Index
A non‑clustered index on a column such as luckyNum stores pairs like (luckyNum, primaryKey) . The engine first locates the matching leaf entry, obtains the primary key, and then uses the clustered index to fetch the full row.
Conclusion
The article walks through the evolution from binary search trees to B+‑trees, explains why MySQL adopts B+‑trees for indexing, and demonstrates how InnoDB stores and retrieves data using clustered and non‑clustered indexes. Remember: "Data is index, index is data."
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.