Databases 10 min read

How Database Indexing Works: A Deep Dive into Performance Gains

This article explains why database indexes are essential, describes how they are structured and stored, walks through concrete calculations for a 5‑million‑row MyISAM table, compares linear and binary search costs, and outlines when and how to use indexes effectively.

ITPUB

Oct 31, 2017

How Database Indexing Works: A Deep Dive into Performance Gains

Why Indexes Are Needed

Data on disk is stored in blocks; each block contains a data segment and a pointer to the next block, similar to a linked list. Because records are often unsorted, searching an unsorted field requires scanning roughly half the blocks (N/2) or the entire table (N) for non‑key fields, leading to poor performance.

When a field is sorted, binary search can be used, reducing the number of block accesses to log₂N, which dramatically improves query speed.

What an Index Is

An index stores the values of one or more columns in a separate, sorted data structure, with each value pointing to the corresponding record. This enables binary search on the index. The trade‑off is additional disk space; in MyISAM each index file can grow quickly if many columns are indexed.

Index Mechanics – Example Table Schema

Field Name      Data Type   Size on Disk
id (Primary)    Unsigned INT   4 bytes
firstName       Char(50)       50 bytes
lastName        Char(50)       50 bytes
emailAddress    Char(100)      100 bytes

We use char instead of varchar to precisely calculate storage. The example table holds 5,000,000 rows without any index.

Analysis Example 1 – Linear vs. Binary Search on the Primary Key

Assuming a MyISAM block size B = 1024 bytes and a fixed record size R = 204 bytes, each block holds 5 records (bfr = B/R). The table therefore occupies N = 1,000,000 blocks.

Linear search on the id field would examine N/2 = 500,000 blocks.

Because id is sorted, binary search reduces accesses to log₂1,000,000 ≈ 20 blocks, a massive speed‑up.

For the unsorted firstName field, linear search must scan all N = 1,000,000 blocks, illustrating the benefit of adding an index.

Index Record Layout

Field Name   Data Type   Size on Disk
firstName    Char(50)    50 bytes
(record pointer) Special   4 bytes (size may vary 2‑5 bytes in MySQL)

The index record is much smaller than the full row, so fewer blocks need to be read.

Analysis Example 2 – Index on a Non‑Key Field

With the same 5,000,000‑row table, each index entry occupies R = 54 bytes. Using the same block size, the index fits bfr = 1024/54 ≈ 18 entries per block, requiring N = 277,778 blocks.

Searching firstName via the index uses binary search: log₂277,778 ≈ 19 block reads, plus one extra block to fetch the actual row, totaling about 20 block reads versus 277,778 without an index.

When to Use an Index

Indexes consume extra disk space; creating too many can exhaust storage. Indexes only speed up queries that filter on the indexed column, so adding an index solely for result display is wasteful. High‑cardinality (unique) columns benefit most; low‑cardinality columns (e.g., only two distinct values) may not improve performance and can be ignored by the optimizer when selectivity falls below ~30% of the table size.

Query Optimizer Basics

The optimizer estimates the cost of different query plans using a mathematical model that relies on cardinality estimates and predicate selectivity. Accurate statistics (e.g., histograms) are crucial; missing or outdated stats can cause the optimizer to choose sub‑optimal plans, especially when predicates are correlated.

Source: Translated from a StackOverflow answer by Xenph Yan (https://stackoverflow.com/users/264/xenph-yan).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

MySQL B+Tree database indexing disk storage

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.