Databases 17 min read

Why Indexes Speed Up Database Queries: From Binary Trees to B+ Trees

This article explains how database indexes improve query performance by exploring binary trees, binary search, balanced trees, B‑trees, and B+‑trees, illustrating their structures, advantages, disadvantages, and the impact of disk I/O on overall efficiency.

ITPUB

Jan 19, 2021

Why Indexes Speed Up Database Queries: From Binary Trees to B+ Trees

Preface

A girl asked how to instantly improve database query performance, and the answer was simply “add an index”. She then wondered why an index actually speeds up queries, prompting a deeper exploration of tree data structures used in indexing.

Binary Tree Basics

Binary Tree

A binary tree is a hierarchical structure with each node having at most two children. It is the simplest member of the tree family but by itself does not directly improve query speed.

Binary Search

Binary Search

Binary search works on a sorted list by repeatedly halving the search range. Using a metaphor of parrots ordered by height, two methods are compared:

Method 1: Scan

Measure each item sequentially; worst‑case time complexity O(n).

Method 2: Binary Search

After sorting, locate the target in O(log₂ n) steps; for 13 parrots the worst case is 4 comparisons.

Binary search reduces the number of I/O operations when data is read from disk, but only if the data remains sorted.

Problem: Compare the performance of scanning versus binary search for a specific height.

Because the data is already ordered, each step halves the search range, giving a complexity of log2(n). For n=13, log₂ 13 ≈ 4, so the worst case requires only four comparisons.

Balanced Binary Trees

Balanced Binary Tree

When a binary search tree is frequently updated, the root may drift away from the true middle, causing imbalance. Balanced trees (e.g., red‑black trees) enforce height constraints so that left and right sub‑trees differ by at most one, preserving efficient search.

Multi‑Way Trees: B‑Tree

B‑Tree

A B‑tree is a multi‑way tree where each node can hold multiple keys and point to up to three children (in a 3‑order example). Searching for a value proceeds by comparing the target with the keys in the node and descending to the appropriate child.

Compare the target with the node’s keys to choose the left, middle, or right child.

Repeat until the leaf node is reached.

Searching 10 in a 3‑order B‑tree of 26 items required only about 3 comparisons (log₃ 26 ≈ 3), versus 5 comparisons for a balanced binary tree (log₂ 26 ≈ 5).

Real‑World Example: Age‑Indexed B‑Tree

An example shows a 3‑order tree storing user ages as index keys while business data resides in leaf nodes. This structure is essentially a B‑tree.

Disk I/O Considerations

Disk I/O

Data must be stored on disk and later retrieved, making disk I/O a major performance bottleneck. Disk architecture includes platters, arms, read/write heads, and sectors. Locating data involves seeking to the correct track and waiting for the platter to rotate to the desired sector.

Because mechanical movement is orders of magnitude slower than CPU operations, reducing the number of I/O operations and increasing the amount of useful data per I/O are key optimization goals.

Cache frequently accessed data to cut I/O count.

Read larger blocks to maximize useful data per I/O.

Avoid loading unnecessary business data during index lookups.

B+‑Tree: Separating Index and Data

B+‑Tree

A B+‑tree stores only index keys in internal nodes, while leaf nodes hold both keys and pointers to the actual business records. This design reduces the size of internal nodes, allowing more keys per node and thus fewer disk reads.

Even when index and data are stored separately, the leaf nodes can contain pointers (addresses) to the real data, preserving the benefit of reduced I/O.

Summary

Data resides on disk, whose I/O speed is far slower than CPU.

Improving disk performance focuses on fewer I/O operations and more useful data per I/O.

Indexes use multi‑way trees (B‑trees, B+‑trees) to make the tree shallower, reducing I/O.

B+‑trees separate index keys from business data, increasing the effective data per I/O.

Ordered structures and binary‑search‑like lookups dramatically shrink the search range.

Even a simple index scan is much faster than scanning the full table because the index contains far fewer columns.

Knowledge Extension

Tree structures excel at query performance and are used in many systems, such as:

HashMap converting to red‑black trees on hash collisions.

Database indexes implemented with B+‑trees.

Search engine inverted indexes using trie structures.

The article provides a high‑level overview of why database indexes based on B+‑trees improve query speed, without delving into low‑level implementation details.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

query-performance Binary Search B+Tree Disk I/O Database Index balanced tree B-tree

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.