Understanding Database Indexes: Storage Principles, Index Types, and SQL Optimization Techniques
This article explains how data is physically stored, why database indexes dramatically speed up queries through sorted structures and binary search, discusses clustered versus non‑clustered indexes, their trade‑offs, and provides practical SQL optimization tips to avoid common pitfalls.
Overview
Human information storage has evolved from simple media to modern databases, which now hold the data of most enterprises. Databases offer fast data access, largely thanks to the use of indexes.
Computer Storage Principles
Data persisted in a database ultimately resides on computer storage devices such as RAM, hard‑disk drives (HDD) and solid‑state drives. RAM is fast but volatile, while HDDs provide large, cheap, non‑volatile storage. Operating systems move data from slow disks to faster memory before applications can use it. HDDs consist of rotating platters, tracks, sectors, and a moving read/write head; accessing data involves seeking to the correct track, waiting for the platter to rotate, and reading the sector.
How Indexes Work
An index functions like a book's table of contents: it lets the database locate rows without scanning the entire table. By maintaining a sorted structure (typically a B‑tree), the database can apply binary search to find the desired records quickly.
Binary Search
Binary search requires sorted data. For a table with 100,000 rows stored in fixed‑length blocks of 1,024 bytes (205 bytes per record), there are about 20,000 blocks. A binary search reduces the number of block accesses to log₂(20,000) ≈ 14, dramatically fewer than a full scan of all blocks.
Why Indexes Speed Queries
Because the index is pre‑sorted, the database can jump directly to the relevant range of rows, often using the primary key (which is unique). This eliminates the need for linear scans and reduces I/O.
Too Many Indexes
Creating an index on every column defeats its purpose: the index itself becomes as large as the table, adding overhead comparable to a full scan.
Drawbacks of Indexes
Indexes improve read performance but degrade write performance, as each INSERT, UPDATE or DELETE must also modify the index. They also consume additional disk space, so only essential columns—typically unique or frequently queried ones—should be indexed.
Clustered Index
A clustered (or "clustered") index stores table rows in the same physical order as the index key (usually the primary key). Only one clustered index can exist per table, and it provides fast range scans because data pages are contiguous.
Primary Key and Clustered Index
In most relational databases the primary key automatically creates a clustered index. Use it for columns with many distinct values, range queries (BETWEEN, >, <), and columns frequently used in JOIN, GROUP BY or ORDER BY clauses.
SQL Optimization Techniques
1. Avoid full‑table scans by ensuring WHERE/ON columns are indexed. 2. Prevent index loss by not applying functions or implicit conversions on indexed columns. 3. Use covering indexes to satisfy queries without touching the table. 4. Avoid NOT EQUAL, IS NULL, IS NOT NULL, and leading‑wildcard LIKE patterns, as they bypass indexes. 5. Prefer index‑ordered results over explicit sorting. 6. Select only needed columns and avoid unnecessary temporary tables.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.