Databases 19 min read

Understanding Database Index Structures: From Binary Trees to B‑Tree and B+Tree

This article explains how library indexing inspires database indexing, introduces binary search trees, AVL trees, B‑Tree and B+Tree structures, and details InnoDB and MyISAM storage mechanisms, page organization, clustered versus non‑clustered indexes, and practical index‑optimization advice.

Java Captain
Java Captain
Java Captain
Understanding Database Index Structures: From Binary Trees to B‑Tree and B+Tree

Binary Search Trees

Binary search trees (BST) are tree structures where each node has at most two children, with the left subtree containing smaller values and the right subtree larger values. Inserting the sequence [35, 27, 48, 12, 29, 38, 55] creates a balanced BST that orders the data.

1. Each node contains an element and up to two sub‑trees. 2. Left subtree values are less than the parent; right subtree values are greater.

Inserting the same numbers in sorted order (ascending) causes the tree to degenerate into a linear chain, illustrating the need for balanced trees.

Balanced Binary Trees (AVL)

AVL trees are self‑balancing BSTs where the height difference between left and right sub‑trees of any node never exceeds one, and both sub‑trees are themselves AVL trees. The earlier unsorted insertion example would be rebalanced to maintain efficient search performance.

B‑Tree

B‑Tree is a multi‑way tree designed for disk‑based storage. An m‑order B‑Tree satisfies several properties: each node has at most m children, at least ⌈m/2⌉ children (except the root), all leaves are on the same level, and each node stores between ⌈m/2⌉ and m‑1 keys.

Example insertion of the array [0,1,2,3,4,5,6,7] into a 3‑order B‑Tree demonstrates node splits and key promotions.

B+Tree

B+Tree refines B‑Tree for external storage by storing only keys in internal nodes, keeping all actual records in leaf nodes, and linking leaf nodes together for fast range scans.

1. Internal nodes store only keys. 2. All data resides in leaf nodes. 3. Leaf nodes contain the full set of elements. 4. Leaf nodes are linked via pointers.

B‑Tree or B+Tree?

Although B‑Tree can find a key directly in internal nodes, B+Tree’s leaf‑only data layout reduces tree height and disk I/O, making it the preferred index structure in most relational databases such as MySQL InnoDB.

InnoDB Data Storage

InnoDB pages are 16 KB blocks. New rows are appended to the current page; when a page fills, a new page is created. Page splits copy the original page, insert the new row into a fresh page, and turn the original page into a non‑leaf index node.

1. Create Page2 as a copy of Page1. 2. Create Page3 for the new row. 3. Page1 becomes the root index node pointing to Page2 and Page3.

Primary‑key auto‑increment values keep inserts sequential, minimizing page splits; random keys cause frequent splits and lower utilization.

InnoDB Data Lookup

Lookup follows the B+Tree path from root to leaf, then uses the page’s internal sparse index (Page Directory) to locate a slot via binary search, followed by a sequential scan within the slot’s block to find the exact record.

Clustered vs. Non‑Clustered Indexes

Clustered indexes store the actual row data in leaf nodes; non‑clustered indexes store only the clustered key in leaf nodes, requiring a “table lookup” (back‑table) to fetch the full row.

InnoDB vs. MyISAM

InnoDB stores rows physically in primary‑key order, while MyISAM stores rows in insertion order. MyISAM’s primary‑key index points to row offsets, allowing direct access without a back‑table, which can be faster for certain workloads.

Index Optimization Tips

Common advice includes avoiding leading‑wildcard LIKE patterns, limiting the number of indexes per table, using covering indexes, and avoiding indexes on columns with high duplicate values; understanding the underlying B+Tree mechanics explains why these recommendations improve performance.

InnoDBdata structuresMyISAMB-TreeB-TreeDatabase Index
Java Captain
Written by

Java Captain

Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.