Why Do Database Indexes Speed Up Queries? A Deep Dive into Storage and Optimization
This article explains how databases store data on physical devices, how indexes work like a book’s table of contents to accelerate queries, covers storage fundamentals, binary search, clustered vs non‑clustered indexes, and practical SQL optimization tips to avoid full‑table scans and index pitfalls.
Overview
Human information storage has evolved over time, and today most corporate data resides in databases. Databases, as the latest data‑storage medium, offer many advantages, most notably very fast data access speed, largely thanks to indexes.
Understanding where a database stores its data requires basic knowledge of computer storage.
Persisted data lives on storage devices such as RAM (fast, volatile) or hard disks (slow, persistent). Operating systems first move data from slower disks to faster RAM before applications access it.
A typical hard‑disk drive contains multiple platters, each divided into tracks and sectors. Data retrieval involves seeking the correct track, rotating the platter, and reading the sector, which adds mechanical latency.
Because of this overhead, databases cannot read directly from disks; they rely on RAM, which lacks moving parts, for high‑performance access.
How Index Works
Indexes act like a book’s table of contents, allowing fast lookup without scanning the entire table.
Consider a table with 100,000 rows; scanning each row is slow, but binary search on sorted data dramatically speeds up retrieval.
Binary Search
Binary search requires sorted data and reduces the number of comparisons from linear to logarithmic.
Assuming fixed‑length records of 204 bytes and a block size of 1024 bytes, each block holds 5 records, resulting in 20,000 blocks for 100,000 rows.
Without an algorithm, the worst‑case scan reads all 20,000 blocks; binary search needs only log₂20,000 ≈ 14 comparisons, roughly an 800‑fold improvement.
Why Indexes Speed Up Queries
Indexes store rows in sorted order, enabling binary search and reducing I/O, which explains why indexing primary‑key columns yields the best performance.
Why Too Many Indexes Hurt Performance
Excessive indexes increase storage and maintenance overhead, similar to an overly detailed book index that becomes inefficient.
Drawbacks of Indexes
Indexes improve read performance but slow writes because each insert or update must also modify the index.
Each indexed column adds write overhead.
Prefer indexing unique columns.
Foreign‑key columns should be indexed for join performance.
Indexes consume disk space; choose wisely.
What Is a Clustered Index
A clustered (or “clustered”) index stores rows in the same physical order as the indexed column, usually the primary key; a table can have only one.
In a clustered index, leaf nodes contain the actual data rows; in a non‑clustered index, leaf nodes point to data blocks.
Primary Key Usually Creates a Clustered Index
Before creating a clustered index, consider access patterns; suitable columns include those with many distinct values, range queries, frequent joins, or ORDER BY/GROUP BY usage. Avoid clustering on columns that change frequently, as row movement can be costly.
Typical index‑ineffective case: using OR in conditions prevents index usage; prefer IN.
Common SQL Optimization Techniques
1. Avoid full‑table scans by indexing columns used in ON/WHERE clauses.
2. Prevent index loss by not applying functions or implicit conversions on indexed columns.
3. Use covering indexes to satisfy queries without accessing the table.
4. For MySQL, avoid !=, IS NULL, and leading‑wildcard LIKE as they bypass indexes.
5. Prefer index‑based sorting, select only needed fields, and minimize temporary tables.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
