Unlocking Fast Disk Indexing: How B‑Trees, B+‑Trees & R‑Trees Work
This article explains why multi‑way search trees such as B‑trees, B+‑trees, B*‑trees and R‑trees are crucial for reducing disk I/O in large‑scale storage systems, covering their hardware background, structural definitions, height analysis, insertion and deletion algorithms, and practical examples with code and diagrams.
First Section – B‑Tree, B+‑Tree, B*‑Tree
Dynamic search trees like binary search trees, balanced binary trees, red‑black trees and the multi‑way B‑tree family (B‑tree, B+‑tree, B*‑tree) have search time O(log₂N), but their height directly influences disk I/O cost. When many keys are stored on disk, a binary tree’s depth causes frequent seeks; a multi‑way tree reduces height and thus I/O.
External Memory – Disk
Computers use main memory (fast, small, volatile) and external memory (disk) which offers large capacity but slower access. A disk consists of platters, tracks (cylinders), and sectors. Read/write involves moving the arm to the correct cylinder (seek time), waiting for rotation (latency), and transferring data (transfer time). Typical seek time ≈0.1 s, latency ≈0.0083 s, transfer ≈0.02 µs per byte.
Because I/O cost is dominated by the seek component, storing many keys per disk block and keeping the tree shallow is essential.
B‑Tree
What is a B‑tree? It is a balanced multi‑way search tree designed for external storage. Each internal node can have up to m children (m ≥ 2) and at least ⌈m/2⌉ children (except the root). All leaves reside at the same depth. Keys in a node are stored in sorted order, and each node contains between ⌈m/2⌉‑1 and m‑1 keys.
Key properties (order‑based definition):
Each node has at most m children.
Non‑root nodes have at least ⌈m/2⌉ children.
All leaves are at the same level.
Root may have as few as two children.
Height analysis shows that for N keys the maximum height is log⌈m/2⌉((N+1)/2) + 1, which is much smaller than a binary tree’s height.
B+‑Tree
B+‑tree stores all actual records in leaf nodes; internal nodes contain only separator keys. This makes internal nodes smaller, allowing more keys per disk block, reducing I/O. All leaves are linked for efficient range queries.
B*‑Tree
B*‑tree is a variant of B+‑tree that requires internal nodes to be at least 2/3 full, improving space utilization. Splitting rules differ: when a node overflows, it first tries to redistribute with a sibling; if that fails, it splits into three nodes.
Insertion and Deletion in B‑Trees
Insertion proceeds by locating the appropriate leaf. If the leaf has space, the key is inserted; otherwise the leaf splits, the middle key moves up, and the split may propagate to the root, increasing tree height.
typedef struct {
int Count; // number of keys in the node
ItemType Key[4]; // keys (max 4 for a 5‑order tree)
long Branch[5]; // child pointers (or pseudo‑pointers)
} NodeType;Deletion removes a key from its leaf. If the leaf underflows (fewer than ⌈m/2⌉‑1 keys), it borrows from a sibling or merges with a sibling, possibly causing underflow to propagate upward. The article provides step‑by‑step examples with letters A‑Z illustrating splits, merges, and height adjustments.
Second Section – R‑Tree: Handling Spatial Data
R‑tree extends B‑tree concepts to multi‑dimensional data (e.g., geographic coordinates). Each leaf stores bounding rectangles (MBRs) that minimally enclose the actual objects; internal nodes store MBRs that enclose their children’s rectangles.
Structure
Leaf entry: (I, tuple‑identifier) where I is the minimal bounding rectangle. Non‑leaf entry: (I, child‑pointer). All leaves are at the same depth, making the tree balanced.
Operations
Search – Given a query rectangle S, recursively visit any node whose MBR intersects S. At leaf level, return all records whose rectangles intersect S.
Insert – Choose the leaf whose MBR needs the smallest enlargement to accommodate the new rectangle (break ties by smallest area). Insert; if the leaf overflows, split it and adjust ancestors, possibly creating a new root.
Delete – Locate the leaf containing the record, remove it, then invoke CondenseTree to handle underflow: remove underfull nodes, re‑insert their entries, and adjust MBRs up the tree. If the root ends up with a single child, that child becomes the new root.
Limitations
R‑trees perform best for 2‑6 dimensions; higher dimensions suffer from increased overlap and reduced efficiency. Variants such as R*‑tree improve performance.
Overall, B‑tree family structures minimize disk I/O for one‑dimensional indexing, while R‑trees provide efficient indexing for multi‑dimensional spatial queries.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
