Databases 14 min read

Understanding B+ Tree Indexes in MySQL

This article explains why B+ trees are the dominant data structure for MySQL indexes, compares them with hash tables, linked lists, and skip lists, and details page splits, merges, and how index values map to row records, helping readers master high‑frequency interview questions.

Full-Stack Internet Architecture
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Understanding B+ Tree Indexes in MySQL

When a SQL query runs slowly, developers often wonder why adding an index speeds up data retrieval and what underlying structure stores that index. The answer is the B+ tree, which many storage engines adopt because it satisfies fast exact lookups, range queries, and ordered scans.

Definition of the problem – Using a sample CREATE TABLE `user` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, `name` varchar(20) DEFAULT NULL COMMENT '姓名', `idcard` varchar(20) DEFAULT NULL COMMENT '身份证号码', `age` tinyint(10) DEFAULT NULL COMMENT '年龄', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='用户信息'; we often need queries such as SELECT * FROM user WHERE id = 123; , SELECT * FROM user WHERE id > 123 AND id < 234; , and SELECT * FROM user WHERE id < 1234 ORDER BY id DESC LIMIT 10; . An index must support fast exact lookup, fast range lookup, and ordered (both forward and reverse) traversal.

Comparison of common data structures – Hash tables provide O(1) exact lookups but cannot handle range queries or ordering. Linked lists support ordered traversal but lack fast search. Skip lists add multi‑level pointers to linked lists, achieving O(log n) search, similar to B+ trees. Balanced binary search trees also give O(log n) search, and by converting them into a B+ tree we obtain leaf nodes linked together for efficient range scans.

From a balanced binary search tree to a B+ tree – By moving all data values to the leaf level and linking leaves with a doubly‑linked list, the internal nodes store only keys, reducing index size. Searching for a range (e.g., 15–27) requires locating the start leaf (≈log n steps) and then scanning sequentially.

Page split and page merge – Inserting a random primary key such as an ID card number can cause a leaf node to exceed its capacity, triggering a page split and extra I/O. Using an auto‑increment integer primary key keeps inserts at the end, avoiding splits. Deleting rows can cause under‑filled pages, leading to page merges when a node’s occupancy falls below N/2.

How the index finds the row record – The actual row data resides only in the leaf nodes (clustered index). Once the index value is located, the corresponding row is retrieved directly, while non‑leaf nodes store only keys, keeping the index compact.

Summary of B+ tree characteristics – Each node has between N/2 and N children (root is an exception), only leaf nodes store full rows, leaves are linked for fast range scans, and the tree height is minimized by choosing a node fan‑out that matches the storage page size (typically 16 KB). This design balances memory usage, disk I/O, and query performance.

performanceSQLMySQLdata structuresB-TreeDatabase Index
Full-Stack Internet Architecture
Written by

Full-Stack Internet Architecture

Introducing full-stack Internet architecture technologies centered on Java

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.