Why MySQL Uses B+ Trees: A Step‑by‑Step Dive into Index Data Structures
This article walks through the evolution of MySQL index implementations—from hash tables and binary search trees to AVL, red‑black, B‑trees and finally B+ trees—explaining their performance trade‑offs, disk‑I/O considerations, and how InnoDB and MyISAM store data and indexes differently.
1. Underlying Data Structure Choices
MySQL’s performance heavily depends on its storage engine and index design; the choice of data structure determines how quickly rows can be retrieved.
1.1 Hash Table
Hash tables provide O(1) lookup by converting a key into a fixed‑length address. Example query: select * from user where id=7; The hash of 7 yields an address (e.g., 0x77) that points directly to the row. However, hash tables suffer from collisions, requiring chaining, and they cannot efficiently support range queries such as id > 3.
1.2 Binary Search Tree (BST)
BSTs offer O(log n) lookup and naturally support range scans because leaf nodes are ordered. An unbalanced BST can degenerate into a linked list, causing O(n) worst‑case performance, especially problematic for auto‑increment primary keys.
1.3 AVL and Red‑Black Trees
Self‑balancing trees (AVL, Red‑Black) keep height logarithmic, avoiding the degeneration of plain BSTs. AVL trees provide slightly better lookup counts but require more rotations; Red‑Black trees are slightly less balanced but cheaper to maintain. Both still store a single key per node, leading to many disk I/O operations.
1.4 B‑Tree
B‑trees increase the branching factor, storing multiple keys per node to reduce tree height and disk I/O. With a node capacity of 2 keys, a B‑tree with 7 rows needs only two node reads; with 16 rows, four node reads are required.
1.5 B+‑Tree
B+‑trees store only keys in internal nodes and keep full row data in leaf nodes linked together, enabling efficient range scans. Because leaf nodes are linked, sequential reads are fast, and the higher fan‑out dramatically lowers tree height, minimizing disk I/O.
2. InnoDB and MyISAM Engine Implementations
MySQL supports pluggable storage engines; the two most common are InnoDB (clustered index) and MyISAM (non‑clustered index).
2.1 MyISAM
MyISAM stores data and indexes in separate files ( .MYD for data, .MYI for indexes). The primary key index is a B+‑tree whose leaf nodes contain the physical address of each row, allowing a single I/O to fetch the record.
2.2 InnoDB
InnoDB uses a clustered index: the primary key B+‑tree stores the full row in its leaf nodes, while secondary indexes store only the primary key value. To retrieve a row via a secondary index, InnoDB first finds the primary key in the secondary B+‑tree, then performs a second lookup in the primary B+‑tree.
This design reduces storage redundancy but adds an extra lookup step compared to MyISAM.
3. Practical Indexing Guidelines
Create indexes on columns frequently used in query predicates.
Avoid indexing columns with low cardinality, even if they appear often in conditions.
Do not index columns that are updated very frequently, as index maintenance overhead can outweigh benefits.
Overall, B+ trees provide the best balance of fast point lookups, efficient range scans, and minimal disk I/O, which is why MySQL’s primary indexes (InnoDB) and MyISAM secondary indexes are implemented with B+ trees.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
