Why B+Tree Beats B-Tree: Unlocking MySQL InnoDB Performance
This article explains how B+Tree improves disk I/O efficiency in MySQL InnoDB by detailing disk storage fundamentals, sector/block/page concepts, the differences between B‑Tree and B+Tree, and practical search examples that illustrate reduced I/O operations and faster queries.
Disk Access Principles
A disk consists of multiple concentric platters divided into tracks (cylinders) and sectors, where each sector is the smallest storage unit (typically 512 B). The operating system reads and writes data in blocks (usually 4 KB), which are composed of several sectors, while memory works with pages.
In MySQL InnoDB, the smallest storage unit is a page, defaulting to 16 KB (configurable to 4 KB, 8 KB, 16 KB, 32 KB, or 64 KB). Understanding the relationship between sector, block, and page is essential for estimating how many rows a B+Tree node can hold.
B-Tree
A B‑Tree is a balanced multi‑way search tree where each node contains up to m children and at least ⌈m/2⌉ children (except the root). All leaf nodes reside at the same depth, and keys within a node are sorted.
Typical B‑Tree properties:
Maximum m children per node.
Non‑root, non‑leaf nodes have at least ⌈m/2⌉ children.
Root has at least two children if it is not a leaf.
All leaves are on the same level.
Each internal node stores n keys where ⌈m/2⌉‑1 ≤ n ≤ m‑1.
Example search for key 32 requires three disk I/O reads: root → middle child → leaf, plus in‑memory binary search.
B+Tree
B+Tree refines B‑Tree by storing only keys in internal nodes and keeping all data records in leaf nodes, which are linked together for fast range scans. This structure reduces tree height and improves I/O performance.
Key differences:
Internal nodes contain only key values.
Leaf nodes are linked via a chain pointer.
All actual data rows reside in leaf nodes.
In MySQL, the InnoDB engine uses B+Tree for both clustered (primary) and secondary indexes. A clustered index stores the full row in leaf nodes, while a secondary index stores only the primary key, requiring an additional lookup ("回表查询") to retrieve the full row.
Typical page size (16 KB) can hold roughly 1 000 key‑pointer pairs (assuming 8 B key + 8 B pointer), allowing a three‑level B+Tree to index about one billion rows.
During a query, InnoDB keeps the root node in memory, so locating a row typically requires only 1–3 disk I/O operations, even for large tables.
对数据结构感兴趣的朋友,这里推荐一个可视化网站,里面包含各种数据结构:https://www.cs.usfca.edu/~galles/visualization/Algorithms.html 参见官网 https://dev.mysql.com/doc/refman/5.7/en/glossary.html#glos_page_sizeUnderstanding these fundamentals helps developers design efficient schemas, choose appropriate primary key types (INT or BIGINT), and anticipate the performance impact of index depth and disk I/O.
Architect's Alchemy Furnace
A comprehensive platform that combines Java development and architecture design, guaranteeing 100% original content. We explore the essence and philosophy of architecture and provide professional technical articles for aspiring architects.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
