Databases 10 min read

Understanding B‑Tree and B+‑Tree: Structure, Height, and I/O Efficiency

This article explains the motivation behind B‑trees, their definition, node constraints, height analysis, and compares B‑trees with B+‑trees, highlighting how multi‑level storage and batch disk I/O reduce lookup costs in large‑scale data systems.

Selected Java Interview Questions

Jun 17, 2020

Understanding B‑Tree and B+‑Tree: Structure, Height, and I/O Efficiency

Background

Before discussing B‑trees, it is important to understand two facts: storage devices have vastly different access speeds (disk access is on the order of milliseconds, memory access on the order of nanoseconds), and modern storage systems are organized in hierarchical layers.

The most frequently used data should reside in the higher, faster, smaller storage; only when it is not found there do we search lower, larger storage. Consequently, when processing massive data that cannot fit entirely in memory, the actual runtime is often dominated by the number of I/O operations between storage levels, making I/O reduction crucial for performance.

Fact 2

Disk reads occur in blocks (or pages); all data within the same block can be retrieved in a single I/O operation, making the cost of reading 1 B almost the same as reading 1 KB. This leads to the principle of locality: when a piece of data is accessed, nearby data is likely to be accessed soon.

B‑Tree

Assume a dataset of one billion records. Using a balanced binary search tree (BBST) would require up to 30 I/O operations, each reading only one key. In contrast, a B‑tree reads a whole node (a "super‑node") containing many keys in a single I/O.

A B‑tree is a multi‑way balanced search tree designed for disks or other auxiliary storage. Each level reads a super‑node, whose size is determined by the disk block size. For example, with a 1 MiB block and 4‑byte keys, a node can hold 256 keys (m≈256), yielding a tree height of about log 256 (10⁹) ≈ 4, so a lookup requires at most four I/O operations.

Typically the root node stays in memory. Searching a B‑tree involves a sequential or binary search within a node; if the key is not found, the appropriate child pointer triggers another disk I/O to load the next node, and the process repeats.

Definition of B‑Tree

A B‑tree is a balanced multi‑way search tree. An m‑order B‑tree must satisfy:

Each node has at most m children (m ≥ 2).

Every non‑leaf node (except the root) has at least ⌈m/2⌉ children.

If the root is not a leaf, it has at least two children.

A non‑leaf node with k children stores k‑1 keys in ascending order.

All leaf nodes appear at the same depth.

Thus an m‑order B‑tree can also be described as a (⌈m/2⌉, m) tree. For example, an m = 4 B‑tree is a (2, 4) tree, which is closely related to red‑black trees.

Height of a B‑Tree

Let n be the number of keys, h the height (root at level 1), and m the order. The maximum height occurs when nodes contain the minimum number of keys. Using p = ⌈m/2⌉, the total number of nodes satisfies n ≥ 2·p^(h‑1) − 1, leading to h ≤ log p ((n+1)/2) + 1.

The minimum height occurs when nodes are as full as possible (each node has m children). Then n ≤ m^h − 1, giving h ≥ log m (n+1).

B+‑Tree

Definition of B+‑Tree

A B+‑tree is a variant of the B‑tree with two main differences:

Leaf nodes contain all keys and pointers to the actual records, and leaf keys are sorted with sibling leaf nodes linked together.

Internal nodes store only the maximum (or minimum) key of each subtree, acting as an index.

Because only internal nodes hold indexes, a B+‑tree can store more keys per node, increasing the amount of data retrieved per disk I/O and reducing the number of I/Os compared to a B‑tree. Additionally, the linked leaf nodes enable efficient range queries, which is why MySQL and many relational databases use B+‑trees for indexing.

Source: cnblogs.com/kkbill/p/11381783.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

I/O optimization Data Structures Algorithms database indexing B-Tree

Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.