Databases 21 min read

Why MySQL Uses B+ Trees: A Step‑by‑Step Dive into Index Data Structures

This article walks through the evolution of MySQL index implementations—from hash tables and binary search trees to AVL, red‑black, B‑trees and finally B+ trees—explaining their performance trade‑offs, disk‑I/O considerations, and how InnoDB and MyISAM store data and indexes differently.

dbaplus Community

Jul 8, 2020

Why MySQL Uses B+ Trees: A Step‑by‑Step Dive into Index Data Structures

1. Underlying Data Structure Choices

MySQL’s performance heavily depends on its storage engine and index design; the choice of data structure determines how quickly rows can be retrieved.

1.1 Hash Table

Hash tables provide O(1) lookup by converting a key into a fixed‑length address. Example query: select * from user where id=7; The hash of 7 yields an address (e.g., 0x77) that points directly to the row. However, hash tables suffer from collisions, requiring chaining, and they cannot efficiently support range queries such as id > 3.

1.2 Binary Search Tree (BST)

BSTs offer O(log n) lookup and naturally support range scans because leaf nodes are ordered. An unbalanced BST can degenerate into a linked list, causing O(n) worst‑case performance, especially problematic for auto‑increment primary keys.

1.3 AVL and Red‑Black Trees

Self‑balancing trees (AVL, Red‑Black) keep height logarithmic, avoiding the degeneration of plain BSTs. AVL trees provide slightly better lookup counts but require more rotations; Red‑Black trees are slightly less balanced but cheaper to maintain. Both still store a single key per node, leading to many disk I/O operations.

1.4 B‑Tree

B‑trees increase the branching factor, storing multiple keys per node to reduce tree height and disk I/O. With a node capacity of 2 keys, a B‑tree with 7 rows needs only two node reads; with 16 rows, four node reads are required.

1.5 B+‑Tree

B+‑trees store only keys in internal nodes and keep full row data in leaf nodes linked together, enabling efficient range scans. Because leaf nodes are linked, sequential reads are fast, and the higher fan‑out dramatically lowers tree height, minimizing disk I/O.

2. InnoDB and MyISAM Engine Implementations

MySQL supports pluggable storage engines; the two most common are InnoDB (clustered index) and MyISAM (non‑clustered index).

2.1 MyISAM

MyISAM stores data and indexes in separate files ( .MYD for data, .MYI for indexes). The primary key index is a B+‑tree whose leaf nodes contain the physical address of each row, allowing a single I/O to fetch the record.

2.2 InnoDB

InnoDB uses a clustered index: the primary key B+‑tree stores the full row in its leaf nodes, while secondary indexes store only the primary key value. To retrieve a row via a secondary index, InnoDB first finds the primary key in the secondary B+‑tree, then performs a second lookup in the primary B+‑tree.

This design reduces storage redundancy but adds an extra lookup step compared to MyISAM.

3. Practical Indexing Guidelines

Create indexes on columns frequently used in query predicates.

Avoid indexing columns with low cardinality, even if they appear often in conditions.

Do not index columns that are updated very frequently, as index maintenance overhead can outweigh benefits.

Overall, B+ trees provide the best balance of fast point lookups, efficient range scans, and minimal disk I/O, which is why MySQL’s primary indexes (InnoDB) and MyISAM secondary indexes are implemented with B+ trees.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

database InnoDB mysql DataStructure index MyISAM B+Tree

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.