Databases 12 min read

Understanding Database Indexes: Storage Principles, Types, and Optimization Techniques

This article explains how computer storage works, why database indexes dramatically speed up queries, the mechanics of binary search, the differences between clustered and non‑clustered indexes, common pitfalls of over‑indexing, and practical SQL optimization strategies to avoid full table scans and index invalidation.

Architect's Guide
Architect's Guide
Architect's Guide
Understanding Database Indexes: Storage Principles, Types, and Optimization Techniques

Overview

Human information storage has evolved from physical media to modern databases, which store data on disk but achieve fast access largely thanks to indexes that act like a book's table of contents.

Computer Storage Principles

Data persisted in a database resides on physical storage devices such as hard disks and RAM. Hard disks consist of rotating platters, tracks, and sectors; accessing data requires seeking the correct track, rotating the platter, and reading the sector, which introduces latency. RAM provides fast, volatile storage, so operating systems cache disk data in RAM before applications use it.

How Indexes Work

An index is analogous to a dictionary's index: it allows the database engine to locate rows without scanning the entire table. By maintaining a sorted structure, the engine can quickly navigate to the relevant data blocks.

Binary Search Method

Binary search requires sorted data and reduces the number of examined blocks dramatically. For example, with 100,000 records stored in 20,000 blocks, a full scan would examine all 20,000 blocks, whereas binary search needs only about log₂(20,000) ≈ 14 comparisons.

固定记录大小=204字节,块大小=1024字节

This yields 5 records per block, so 100,000 records occupy 20,000 blocks.

Why Indexes Speed Up Queries

Indexes pre‑sort data, enabling binary‑search‑like lookups; therefore, queries on indexed columns (especially primary keys) can locate rows in O(log N) time instead of O(N), often providing a several‑hundred‑fold performance boost.

Why Too Many Indexes Hurt Performance

When every column is indexed, the index itself becomes as large as the table, turning the index lookup into a costly operation similar to a full table scan.

Drawbacks of Indexes

Each indexed column adds write overhead because inserts/updates must modify both the row and the index.

Indexes consume disk space.

Foreign‑key columns should be indexed to support joins.

Clustered Index

A clustered (or “clustered”) index stores rows physically in the same order as the indexed column values, allowing range queries to read contiguous disk blocks. Only one clustered index can exist per table, typically on the primary key.

Typical Index Invalidations

Using OR conditions, functions, or type conversions on indexed columns can prevent the optimizer from using the index, leading to full scans.

Common SQL Optimization Techniques

1. Avoid Full Table Scans

Ensure WHERE and JOIN predicates reference indexed columns, and consider table size before deciding to scan.

2. Prevent Index Invalidations

Avoid functions, calculations, or implicit conversions on indexed columns; use covering indexes when possible.

3. Reduce Unnecessary Sorting

Prefer indexes that provide the required order instead of sorting results after retrieval.

4. Select Only Needed Columns

Avoid SELECT * to reduce I/O.

5. Minimize Temporary Table Usage

Design queries to work without creating intermediate tables when possible.

DatabaseSQL OptimizationIndexStoragebinary searchclustered index
Architect's Guide
Written by

Architect's Guide

Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.