Why Indexes Speed Up Database Queries: From Binary Trees to B+ Trees
This article explains how database indexes improve query performance by exploring binary trees, binary search, balanced trees, B‑trees, and B+‑trees, illustrating their structures, advantages, disadvantages, and the impact of disk I/O on overall efficiency.
Preface
A girl asked how to instantly improve database query performance, and the answer was simply “add an index”. She then wondered why an index actually speeds up queries, prompting a deeper exploration of tree data structures used in indexing.
Binary Tree Basics
Binary Tree
A binary tree is a hierarchical structure with each node having at most two children. It is the simplest member of the tree family but by itself does not directly improve query speed.
Binary Search
Binary Search
Binary search works on a sorted list by repeatedly halving the search range. Using a metaphor of parrots ordered by height, two methods are compared:
Method 1: Scan
Measure each item sequentially; worst‑case time complexity O(n).
Method 2: Binary Search
After sorting, locate the target in O(log₂ n) steps; for 13 parrots the worst case is 4 comparisons.
Binary search reduces the number of I/O operations when data is read from disk, but only if the data remains sorted.
Problem: Compare the performance of scanning versus binary search for a specific height.
Because the data is already ordered, each step halves the search range, giving a complexity of log2(n). For n=13, log₂ 13 ≈ 4, so the worst case requires only four comparisons.
Balanced Binary Trees
Balanced Binary Tree
When a binary search tree is frequently updated, the root may drift away from the true middle, causing imbalance. Balanced trees (e.g., red‑black trees) enforce height constraints so that left and right sub‑trees differ by at most one, preserving efficient search.
Multi‑Way Trees: B‑Tree
B‑Tree
A B‑tree is a multi‑way tree where each node can hold multiple keys and point to up to three children (in a 3‑order example). Searching for a value proceeds by comparing the target with the keys in the node and descending to the appropriate child.
Compare the target with the node’s keys to choose the left, middle, or right child.
Repeat until the leaf node is reached.
Searching 10 in a 3‑order B‑tree of 26 items required only about 3 comparisons (log₃ 26 ≈ 3), versus 5 comparisons for a balanced binary tree (log₂ 26 ≈ 5).
Real‑World Example: Age‑Indexed B‑Tree
An example shows a 3‑order tree storing user ages as index keys while business data resides in leaf nodes. This structure is essentially a B‑tree.
Disk I/O Considerations
Disk I/O
Data must be stored on disk and later retrieved, making disk I/O a major performance bottleneck. Disk architecture includes platters, arms, read/write heads, and sectors. Locating data involves seeking to the correct track and waiting for the platter to rotate to the desired sector.
Because mechanical movement is orders of magnitude slower than CPU operations, reducing the number of I/O operations and increasing the amount of useful data per I/O are key optimization goals.
Cache frequently accessed data to cut I/O count.
Read larger blocks to maximize useful data per I/O.
Avoid loading unnecessary business data during index lookups.
B+‑Tree: Separating Index and Data
B+‑Tree
A B+‑tree stores only index keys in internal nodes, while leaf nodes hold both keys and pointers to the actual business records. This design reduces the size of internal nodes, allowing more keys per node and thus fewer disk reads.
Even when index and data are stored separately, the leaf nodes can contain pointers (addresses) to the real data, preserving the benefit of reduced I/O.
Summary
Data resides on disk, whose I/O speed is far slower than CPU.
Improving disk performance focuses on fewer I/O operations and more useful data per I/O.
Indexes use multi‑way trees (B‑trees, B+‑trees) to make the tree shallower, reducing I/O.
B+‑trees separate index keys from business data, increasing the effective data per I/O.
Ordered structures and binary‑search‑like lookups dramatically shrink the search range.
Even a simple index scan is much faster than scanning the full table because the index contains far fewer columns.
Knowledge Extension
Tree structures excel at query performance and are used in many systems, such as:
HashMap converting to red‑black trees on hash collisions.
Database indexes implemented with B+‑trees.
Search engine inverted indexes using trie structures.
The article provides a high‑level overview of why database indexes based on B+‑trees improve query speed, without delving into low‑level implementation details.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
