B‑Tree, B+Tree, B*Tree and R‑Tree: Structures, Operations, and Applications in External Memory Indexing
This article provides a comprehensive overview of B‑tree, B+‑tree, B*‑tree and R‑tree data structures, explaining their definitions, node layouts, height analysis, insertion and deletion algorithms with examples, and their roles in external‑memory indexing for databases and file systems.
The article begins by motivating multi‑way search trees for external storage, explaining that disk I/O dominates query cost and that reducing tree height via balanced multi‑branch structures such as B‑trees improves performance.
It then defines the B‑tree (also called a balanced multi‑way search tree), describing its node capacity, order, height formula, and the distinction between internal and leaf nodes. The text includes a C‑style node definition wrapped in typedef struct { int file_num; char *file_name[max_file_num]; BTNode *BTptr[max_file_num+1]; FILE_HARD_ADDR offset[max_file_num]; } BTNode; and discusses how B‑trees store keys and pointers to disk blocks.
Variations B+‑tree and B*‑tree are introduced: B+‑tree stores all keys in leaf nodes and uses internal nodes only as indexes, enabling efficient range queries; B*‑tree further reduces split frequency by allowing sibling pointers and higher minimum fill factor. Their structural differences are illustrated with diagrams.
Insertion and deletion operations for B‑trees are detailed step‑by‑step, including handling of node overflow (splits) and underflow (borrowing or merging), with concrete examples using alphabetic keys and a 5‑order tree. Pseudocode for these operations is provided.
The second major section covers R‑trees, a high‑dimensional extension of B‑trees for spatial indexing. It explains the concept of Minimal Bounding Rectangles (MBR), leaf and non‑leaf node formats, and the properties that make R‑trees balanced.
Algorithms for R‑tree search, insertion, and deletion are presented in pseudocode form (functions Search, Insert, ChooseLeaf, AdjustTree, Delete, FindLeaf, CondenseTree). The article shows how to choose the leaf with minimal enlargement, adjust bounding rectangles upward, and handle node splits or underflows by reinserting orphaned entries.
Finally, the article summarizes that B‑tree families and R‑trees are essential for large‑scale storage systems, database indexes (e.g., MySQL, Oracle), and spatial queries, and points readers to further reading on variants such as R*‑tree.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.