Why Do Database Indexes Speed Up Queries? A Deep Dive into Storage and Index Mechanics
This article explains how databases store data on various storage media, how indexes—especially clustered indexes—leverage sorted structures and binary search to dramatically reduce query time, and outlines best practices and pitfalls of index usage and SQL optimization.
Overview
Human information storage has evolved from primitive methods to modern databases, which hold vast amounts of data. A key advantage of databases is fast data access, largely thanks to indexes that accelerate queries.
Computer Storage Principles
Data persisted in a database resides on a computer's storage devices. Faster storage (e.g., RAM) is more expensive and offers lower capacity, while slower devices (e.g., hard disks) provide large capacity and retain data after power loss.
Hard disks consist of rotating platters, tracks, and sectors. Accessing data involves seeking to the correct track, rotating the platter to position the sector under the read/write head, and transferring the data. Mechanical movement introduces latency, which is why operating systems first move data from disk to faster storage (RAM) before applications use it.
How Indexes Work
An index functions like a book's table of contents: it provides a shortcut to locate data without scanning the entire table. By pre‑sorting data, an index enables binary search, dramatically reducing the number of lookups.
Binary Search Method
Assume fixed‑length records of 204 bytes and a block size of 1024 bytes, yielding 5 records per block. A table with 100 000 records occupies 20 000 blocks. Scanning all blocks is O(N), but binary search reduces the lookup to log₂ 20 000 ≈ 14.3 steps, offering roughly an 800‑fold speedup.
Why Indexes Accelerate Queries
Indexes store rows in a sorted order (or maintain a sorted pointer structure), allowing binary search to locate rows quickly. Primary keys are ideal candidates because they are unique and naturally ordered.
Why Not Too Many Indexes?
Each index adds storage overhead and maintenance cost. Excessive indexes can degrade performance, similar to an overly detailed dictionary where the index itself becomes a bottleneck.
Drawbacks of Indexes
Write operations become slower because both the row and its index entries must be updated.
Indexes consume disk space.
Only columns frequently used in queries, joins, GROUP BY, or ORDER BY should be indexed.
Clustered Index
A clustered (or “clustered”) index stores table rows physically in the same order as the index key, typically the primary key. Only one clustered index can exist per table.
In a clustered index, leaf nodes contain the actual data rows, whereas in a non‑clustered index leaf nodes hold pointers to the data.
Primary Key and Clustered Index
Primary keys are often implemented as clustered indexes because they provide a unique, ordered access path.
Common SQL Optimization Techniques
Avoid full table scans by ensuring WHERE and JOIN columns are indexed.
Prevent index loss by not applying functions or implicit conversions on indexed columns.
Use covering indexes to satisfy queries without accessing the table.
In MySQL, avoid !=, <>, IS NULL, IS NOT NULL, and leading‑wildcard LIKE patterns, which can bypass indexes.
Prefer index‑based sorting over explicit ORDER BY when possible.
Select only necessary columns and avoid creating temporary tables when not needed.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
