Databases 6 min read

Understanding B+ Tree Indexes and Their Impact on Database Performance

This article explains how database indexes are implemented using B+ trees, describing their structure, advantages such as shallow depth and ordered leaf nodes, and how they reduce disk I/O, illustrating calculations of node capacity and emphasizing the importance of index size for performance.

Full-Stack Internet Architecture

Jul 7, 2018

Indexes are a crucial part of databases, used to speed up access to specific data items; therefore understanding how they work behind the scenes and the data structures that store them is essential for fully leveraging their capabilities.

B+ Tree Indexes

Indexes are stored on disk using the B+ tree data structure. While B+ trees share many similarities with binary search trees—each node’s left subtree contains smaller values and the right subtree larger values—there are important differences.

B+ trees can hold many keys in a single node; a typical node may contain thousands of keys, giving the tree a large branching factor and making it much shallower than a binary tree.

All actual values are stored in leaf nodes; internal nodes contain only index entries.

Leaf nodes are linked together in a left‑to‑right linked list, keeping the values ordered and enabling very efficient range queries.

A Typical B+ Tree

Below is an illustration of a typical B+ tree:

Why Use B+ Trees

The main reason to use B+ trees is speed. Because most data resides on disk, which is orders of magnitude slower than memory, a database without a tree structure would have to scan all records sequentially. For billions of rows, such a scan would be prohibitively slow.

With a B+ tree, billions of keys (pointers to rows) can be stored at a height of only three to five levels, so each lookup requires only three to five disk accesses, dramatically reducing I/O operations.

Choosing a B+ tree over other tree structures is advantageous because its shallow depth means each lookup translates to a small, predictable number of disk reads, and the number of reads is directly proportional to the tree height.

How B+ Trees Are Organized

Node size is chosen based on the database page size, because disk I/O reads whole pages rather than partial data, minimizing cost.

For example, InnoDB uses a 16 KB page size. Assuming an integer column indexed with a 4‑byte key, a node can hold: 16 * 1024 / 4 = 4096 keys, and up to 4097 child pointers.

Thus, a height‑1 B+ tree (root + leaf level) can store: 4096 keys in the root and 4096 * 4097 = 16,781,312 keys in the leaf level, allowing over 16 million keys to be retrieved with just two lookups.

The Importance of Index Size

The size of index keys directly affects performance: longer keys reduce the number of keys that fit in a node, increasing tree height; a taller tree requires more disk accesses, which lowers performance. Keeping index keys short helps maintain a shallow B+ tree and minimizes I/O.

Understanding how B+ trees work and how to design indexes appropriately is essential for optimizing query performance in relational databases.

Original article © Ovais Tariq. Please credit the source when reproducing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance InnoDB Data Structures B+Tree disk I/O database index

Written by

Full-Stack Internet Architecture

Introducing full-stack Internet architecture technologies centered on Java

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.