How Indexes Accelerate MySQL Queries: From Binary Trees to B+Trees
This article explains the purpose and principles of database indexes, compares various tree structures such as binary search trees, AVL trees, B‑Trees and B+Trees, and shows how MySQL uses B+Tree indexes to minimize disk I/O and boost query performance.
Definition of Index
MySQL defines an index as a data structure that helps MySQL retrieve data efficiently.
Essentially, an index improves query efficiency by repeatedly narrowing the data range, turning random access into sequential access, allowing a uniform lookup method to locate data.
Think of a bank vault: without an index you would try each vault one by one; with an index you can locate the correct vault area, row, and position directly, dramatically speeding up the search.
Principle of Index
Similar to dictionaries, library catalogs, or train seat charts, indexes continuously shrink the data range and convert random data into ordered data.
Databases must handle more complex queries (range, fuzzy, union, intersection, etc.), requiring sophisticated indexing structures.
The simplest search algorithm is linear search with O(n) complexity, which is inefficient for large data sets.
Computer science provides better algorithms such as binary search and binary‑tree search, but they require specific ordered data structures.
Since data cannot always be stored in a way that satisfies all these structures, database systems maintain separate data structures that reference the actual data, enabling advanced search algorithms. These structures are the indexes.
Below is an illustration of one indexing method.
The figure shows a table with two columns and seven rows. Column 1 stores the physical address of each record. To speed up lookups on Column 2, a binary search tree is maintained on the left, where each node contains an index key and a pointer to the corresponding record address, enabling O(log₂n) search complexity.
In practice, most database systems use B+Tree for indexing. B+Tree evolved from balanced binary trees, so we first review binary search trees, AVL (balanced binary) trees, and multi‑way search trees (B‑Tree).
Binary Search Tree
A binary tree satisfies: left‑subtree keys < root key < right‑subtree keys, resulting in an in‑order (ascending) traversal.
For the shown tree, the average search cost is 2.4 comparisons.
If the tree is unbalanced, the average cost rises to 3.8 comparisons, so a balanced tree is preferred, leading to the AVL tree.
Balanced Binary Tree (AVL Tree)
An AVL tree is a binary search tree where the height difference between any node's two subtrees is at most 1.
When insertions or deletions unbalance an AVL tree, four cases arise: LL, RR, LR, RL. Each case is restored by specific rotations.
Rotations
LL Rotation: Move the left child up as the new root, adjust pointers accordingly.
RR Rotation: Symmetric to LL; the right child becomes the new root.
LR Rotation: Perform an RR rotation on the left child, then an LL rotation on the root.
RL Rotation: Perform an LL rotation on the right child, then an RR rotation on the root.
Balanced Multi‑Way Search Tree (B‑Tree)
Disk storage works with blocks; B‑Tree is designed for block‑oriented access. InnoDB pages are 16 KB by default (configurable via innodb_page_size ).
1 mysql> show variables like 'innodb_page_size';
2 +----------------+-------+
3 | Variable_name | Value |
4 +----------------+-------+
5 | innodb_page_size | 16384 |
6 +----------------+-------+
7 1 row in setA B‑Tree node occupies one disk block and contains multiple sorted keys and child pointers. Example of a 3‑order B‑Tree is shown below.
Searching for key 55 involves three disk I/O operations and three in‑memory searches, demonstrating that disk I/O dominates the cost.
B‑Tree reduces node count compared to AVL, improving I/O efficiency.
B+Tree
B+Tree optimizes B‑Tree for external storage; only leaf nodes store the actual data records, while internal nodes store keys only. This increases the fan‑out, reduces tree height, and further lowers I/O.
All leaf nodes are linked, enabling efficient range and pagination queries.
In InnoDB, a 16 KB page can hold roughly 1 000 keys; a three‑level B+Tree can index about one billion rows. Typically the tree height is 2–4, and the root node stays in memory, so a key lookup requires only 1–3 disk I/O operations.
MySQL uses B+Tree for both clustered (primary) and secondary indexes. Clustered indexes store full rows in leaf nodes, while secondary indexes store only the primary key in leaf nodes and use it to locate the full row.
Conclusion
Although binary search trees, red‑black trees, etc., can implement indexes, MySQL (both MyISAM and InnoDB) adopts B+Tree because its design aligns with disk block storage, minimizing disk I/O during lookups—the most critical performance metric for index structures.
Architecture & Thinking
🍭 Frontline tech director and chief architect at top-tier companies 🥝 Years of deep experience in internet, e‑commerce, social, and finance sectors 🌾 Committed to publishing high‑quality articles covering core technologies of leading internet firms, application architecture, and AI breakthroughs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.