Understanding B‑Tree and B+Tree: Principles, Definitions, and Insertion Algorithms
This article explains the fundamentals of B‑Tree and B+Tree data structures, covering their definitions, differences from red‑black trees, node properties, and detailed insertion procedures, while highlighting why B+Tree is preferred for database and file indexing.
In this article the author shares a concise yet comprehensive explanation of B‑Tree and B+Tree, motivated by interview questions and personal study.
B‑Tree Basics
Dynamic search trees include binary search trees, balanced binary trees, red‑black trees, and B‑Trees, all with time complexity O(log₂N). Reducing tree height improves lookup efficiency, which is crucial when massive data is stored on external disks that require fast location of records.
A B‑Tree is a balanced search tree specifically designed for storage devices or disks.
Distinguishing B‑Tree from Red‑Black Tree
B‑Tree nodes may have many children, whereas a red‑black tree is a binary search tree with at most two children per node. Both structures have similar height O(log N) for N nodes.
Definition of a B‑Tree
For an M‑order B‑Tree:
Each node can have at most m children.
Except for the root and leaf nodes, every node must have at least ⌈m/2⌉ children.
If the root is not a leaf, it must have at least two children.
All leaf nodes reside on the same level and contain no key information themselves.
Node structure example:
struct BTNode {
int keyNum; // actual number of keys
PBTNode parent; // pointer to parent node
PBTNode *ptr; // array of child pointers
keyType *key; // array of keys
}B‑Tree Insertion
Insertion Rules
If the key already exists, replace the old value with the new one.
If the key does not exist, perform insertion at a leaf node.
Insertion steps for an m‑order B‑Tree of height h:
If the target node has fewer than m‑1 keys, insert directly.
If the node already contains m‑1 keys, split the node: the middle key moves up to the parent, and the node is divided into two nodes.
Splitting may propagate upward; in the worst case the root splits, causing the tree to gain an additional level.
B+Tree Insertion
Insertion Rules
1. Empty tree: Insert the key directly.
2. Leaf node: Locate the leaf by key and insert. If the leaf’s key count does not exceed m‑1, insertion finishes. Otherwise, split the leaf into two leaves: the left leaf keeps the first ⌊m/2⌋ records, the right leaf keeps the remaining records, and the middle key is promoted to the parent (which must be an index node). The promoted key’s left child points to the left leaf, and its right child points to the right leaf.
3. Index node: If the index node has ≤ m‑1 keys, insertion finishes. Otherwise, split the index node into two index nodes: the left index keeps the first ⌊(m‑1)/2⌋ keys, the right index keeps the remaining keys, and the middle key is promoted to the parent with appropriate child pointers.
Why B+Tree Is Preferred for Database and File Indexes
1) Lower disk I/O cost: Internal nodes of a B+Tree store only keys, not full records, making them smaller and reducing read/write overhead.
2) Stable query performance: All keys are stored in leaf nodes, so every search follows a path of identical length from root to leaf, guaranteeing consistent lookup time.
Thank you for reading; hope this helps you. Source: blog.csdn.net/yu876876/article/details/84896789
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.