Databases 18 min read

Why MySQL Chooses B+ Trees for Indexes: Disk I/O and Data Structure Insights

This article explains why MySQL adopts B+ trees for its indexes, examining the impact of disk I/O latency, comparing B‑tree variants, and showing how B+ trees reduce I/O operations while supporting efficient range queries and balanced performance for large datasets.

Liangxu Linux
Liangxu Linux
Liangxu Linux
Why MySQL Chooses B+ Trees for Indexes: Disk I/O and Data Structure Insights

Introduction

MySQL stores data and indexes on disk, so the index data structure must minimise disk I/O. This summary explains why InnoDB uses B+ trees, covering disk characteristics, binary search, tree variants, and the concrete InnoDB implementation.

Disk Characteristics and Index Requirements

Disk access latency is measured in milliseconds, while memory access is in nanoseconds – a difference of tens of thousands. The smallest disk unit is a 512 B sector; operating systems read/write in blocks, typically 4 KB (8 sectors). Each node of an on‑disk index therefore incurs at least one disk I/O.

An ideal index should:

Answer queries with as few disk I/O operations as possible.

Support efficient point lookups and range scans.

Binary Search and Binary Search Trees

Binary search on a sorted array halves the search space each step, giving O(log n) time, but inserting requires shifting many elements – impractical on disk.

A binary search tree (BST) avoids shifting by using pointers, yet a naïve BST can degenerate into a linked list when keys are inserted in sorted order, degrading to O(n). Because each node access maps to a disk I/O, tree height directly determines I/O cost.

Self‑Balancing Trees

AVL and Red‑Black trees enforce balance constraints, guaranteeing O(log n) search. However, they still have only two children per node, so height (and I/O) remains relatively high for large key sets.

M‑ary Trees and B‑Tree

Increasing the branching factor to M (>2) reduces height. A B‑tree of order M stores up to M‑1 keys and M child pointers per node, lowering the number of disk reads needed for a lookup.

Example: In a 3‑order B‑tree, searching for key 9 requires three node accesses (three I/Os).

B+ Tree

B+ trees extend B‑trees with two key differences:

Only leaf nodes store the actual records (index + data); internal nodes store keys only.

All leaf nodes are linked together in a doubly‑linked list, enabling fast sequential and range scans.

Because internal nodes contain only keys, they can hold more pointers, making the tree “short and wide” and reducing I/O for deep lookups.

Performance Comparison

Point Queries

B‑tree point lookups can be slightly faster in the best case because a record may be found in an internal node. B+ trees provide a predictable I/O cost: every lookup traverses the same number of levels.

Insert / Delete

Redundant leaf nodes in B+ trees allow deletions to occur directly at the leaf without affecting internal nodes, resulting in faster updates. Insertions may cause node splits but affect only a single path.

Range Queries

The leaf‑node linked list in B+ trees enables range scans by walking the list, avoiding repeated tree traversals and reducing I/O compared to B‑trees, which must descend the tree for each step.

MySQL InnoDB Implementation

InnoDB, MySQL’s default storage engine, uses B+ trees for both primary (clustered) and secondary indexes.

Leaf nodes of a clustered index store the full row data; secondary index leaf nodes store only the primary‑key value.

Each leaf node occupies a 16 KB data page.

Leaf nodes are linked with a doubly‑linked list, enabling efficient range scans.

Conclusion

Choosing a B+ tree for MySQL indexes balances low disk I/O, fast point and range queries, and simple, efficient insert/delete operations. Its multi‑way branching reduces tree height, and the leaf‑node linked list optimises range scans, making it well‑suited for large, disk‑resident datasets.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

indexingdatabasemysqlB+Tree
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.