How Database Indexing Works: A Deep Dive into Performance Gains
This article explains why database indexes are essential, describes how they are structured and stored, walks through concrete calculations for a 5‑million‑row MyISAM table, compares linear and binary search costs, and outlines when and how to use indexes effectively.
Why Indexes Are Needed
Data on disk is stored in blocks; each block contains a data segment and a pointer to the next block, similar to a linked list. Because records are often unsorted, searching an unsorted field requires scanning roughly half the blocks (N/2) or the entire table (N) for non‑key fields, leading to poor performance.
When a field is sorted, binary search can be used, reducing the number of block accesses to log₂N, which dramatically improves query speed.
What an Index Is
An index stores the values of one or more columns in a separate, sorted data structure, with each value pointing to the corresponding record. This enables binary search on the index. The trade‑off is additional disk space; in MyISAM each index file can grow quickly if many columns are indexed.
Index Mechanics – Example Table Schema
Field Name Data Type Size on Disk
id (Primary) Unsigned INT 4 bytes
firstName Char(50) 50 bytes
lastName Char(50) 50 bytes
emailAddress Char(100) 100 bytesWe use char instead of varchar to precisely calculate storage. The example table holds 5,000,000 rows without any index.
Analysis Example 1 – Linear vs. Binary Search on the Primary Key
Assuming a MyISAM block size B = 1024 bytes and a fixed record size R = 204 bytes, each block holds 5 records (bfr = B/R). The table therefore occupies N = 1,000,000 blocks.
Linear search on the id field would examine N/2 = 500,000 blocks.
Because id is sorted, binary search reduces accesses to log₂1,000,000 ≈ 20 blocks, a massive speed‑up.
For the unsorted firstName field, linear search must scan all N = 1,000,000 blocks, illustrating the benefit of adding an index.
Index Record Layout
Field Name Data Type Size on Disk
firstName Char(50) 50 bytes
(record pointer) Special 4 bytes (size may vary 2‑5 bytes in MySQL)The index record is much smaller than the full row, so fewer blocks need to be read.
Analysis Example 2 – Index on a Non‑Key Field
With the same 5,000,000‑row table, each index entry occupies R = 54 bytes. Using the same block size, the index fits bfr = 1024/54 ≈ 18 entries per block, requiring N = 277,778 blocks.
Searching firstName via the index uses binary search: log₂277,778 ≈ 19 block reads, plus one extra block to fetch the actual row, totaling about 20 block reads versus 277,778 without an index.
When to Use an Index
Indexes consume extra disk space; creating too many can exhaust storage. Indexes only speed up queries that filter on the indexed column, so adding an index solely for result display is wasteful. High‑cardinality (unique) columns benefit most; low‑cardinality columns (e.g., only two distinct values) may not improve performance and can be ignored by the optimizer when selectivity falls below ~30% of the table size.
Query Optimizer Basics
The optimizer estimates the cost of different query plans using a mathematical model that relies on cardinality estimates and predicate selectivity. Accurate statistics (e.g., histograms) are crucial; missing or outdated stats can cause the optimizer to choose sub‑optimal plans, especially when predicates are correlated.
Source: Translated from a StackOverflow answer by Xenph Yan (https://stackoverflow.com/users/264/xenph-yan).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
