Understanding B+ Tree Indexes and Their Impact on Database Performance
This article explains how database indexes are implemented using B+ trees, describing their structure, advantages such as shallow depth and ordered leaf nodes, and how they reduce disk I/O, illustrating calculations of node capacity and emphasizing the importance of index size for performance.
Indexes are a crucial part of databases, used to speed up access to specific data items; therefore understanding how they work behind the scenes and the data structures that store them is essential for fully leveraging their capabilities.
B+ Tree Indexes
Indexes are stored on disk using the B+ tree data structure. While B+ trees share many similarities with binary search trees—each node’s left subtree contains smaller values and the right subtree larger values—there are important differences.
B+ trees can hold many keys in a single node; a typical node may contain thousands of keys, giving the tree a large branching factor and making it much shallower than a binary tree.
All actual values are stored in leaf nodes; internal nodes contain only index entries.
Leaf nodes are linked together in a left‑to‑right linked list, keeping the values ordered and enabling very efficient range queries.
A Typical B+ Tree
Below is an illustration of a typical B+ tree:
Why Use B+ Trees
The main reason to use B+ trees is speed. Because most data resides on disk, which is orders of magnitude slower than memory, a database without a tree structure would have to scan all records sequentially. For billions of rows, such a scan would be prohibitively slow.
With a B+ tree, billions of keys (pointers to rows) can be stored at a height of only three to five levels, so each lookup requires only three to five disk accesses, dramatically reducing I/O operations.
Choosing a B+ tree over other tree structures is advantageous because its shallow depth means each lookup translates to a small, predictable number of disk reads, and the number of reads is directly proportional to the tree height.
How B+ Trees Are Organized
Node size is chosen based on the database page size, because disk I/O reads whole pages rather than partial data, minimizing cost.
For example, InnoDB uses a 16 KB page size. Assuming an integer column indexed with a 4‑byte key, a node can hold:
16 * 1024 / 4 = 4096 keys, and up to 4097 child pointers.
Thus, a height‑1 B+ tree (root + leaf level) can store:
4096 keys in the root and 4096 * 4097 = 16,781,312 keys in the leaf level, allowing over 16 million keys to be retrieved with just two lookups.
The Importance of Index Size
The size of index keys directly affects performance: longer keys reduce the number of keys that fit in a node, increasing tree height; a taller tree requires more disk accesses, which lowers performance. Keeping index keys short helps maintain a shallow B+ tree and minimizes I/O.
Understanding how B+ trees work and how to design indexes appropriately is essential for optimizing query performance in relational databases.
Original article © Ovais Tariq. Please credit the source when reproducing.
Full-Stack Internet Architecture
Introducing full-stack Internet architecture technologies centered on Java
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.