Mastering Database Indexes: B‑Tree, B+‑Tree, Hash & Their Real‑World Trade‑offs
This article explains the fundamentals of database indexes, covering what indexes are, their underlying structures such as B‑Tree, B+‑Tree and hash tables, their advantages and drawbacks, and the various index types—including primary, secondary, clustered and non‑clustered—used in MySQL engines.
1. Fundamentals of Indexes
1.1 What is an Index
An index is a data structure used for fast query and retrieval, such as B‑Tree, B+‑Tree, and hash tables. It works like a table of contents, allowing quick location of entries, e.g., finding a word in a dictionary via its index.
Advantages and Disadvantages of Indexes Advantages : (1) Indexes can greatly accelerate query speed; (2) Uniqueness of indexes ensures each row is unique. Disadvantages : (1) Creating and maintaining indexes consumes time, and modifications (INSERT/UPDATE/DELETE) require index updates, potentially reducing SQL execution efficiency; (2) Indexes occupy physical storage space.
1.2 Underlying Data Structures of Indexes
1.2.1 Hash Table
A hash table uses a hash algorithm to quickly locate a value based on its key (index).
hash = hashfunc(key)
index = hash % array_sizeHash algorithms suffer from collisions, where different keys map to the same index. Common resolution methods include open addressing, rehashing, and chaining. JDK 1.8 HashMap uses a red‑black tree, converting a linked list to a tree when its length exceeds a threshold (default 8).
Why MySQL Does Not Use Hash Tables as Index Structures
The main reason is that hash tables do not support ordered or range queries.
// Example of a range query
SELECT * FROM tb1 WHERE id < 500;
// Hash would need to compute hash for every row to check the condition, which is inefficient.1.2.3 B‑Tree and B+‑Tree
B‑Tree is a multi‑way balanced search tree; B+‑Tree is a variant of B‑Tree. Both MyISAM and InnoDB engines use B+‑Tree as the index structure. From the primary‑key perspective, indexes are divided into primary key indexes and secondary indexes. From the perspective of whether index and data are stored together, indexes are divided into clustered indexes and non‑clustered indexes; MyISAM uses non‑clustered indexes, while InnoDB uses clustered indexes.
Differences between B‑Tree and B+‑Tree 1) B‑Tree stores both keys and values in all nodes, while B+‑Tree stores keys and values only in leaf nodes; internal nodes store only keys. 2) B‑Tree leaf nodes are independent, whereas B+‑Tree leaf nodes are linked together via a pointer chain. 3) B‑Tree query process may finish before reaching leaf nodes by binary searching within a range, while B+‑Tree always traverses from root to leaf, providing stable query efficiency.
2. Types of Indexes and Their Principles
2.1 Classification by Index‑Data Separation
2.1.1 Clustered Index
A clustered index stores the index structure together with the data. Primary key indexes are clustered.
In InnoDB, the .ibd file contains both index and data. For InnoDB tables, the index (B+‑Tree) stores index entries in non‑leaf nodes, while leaf nodes store both the index and the corresponding row data.
2.1.2 Non‑Clustered Index
A non‑clustered index stores the index structure separately from the data.
MyISAM's .MYI file contains only the index. Each leaf and non‑leaf node of the B+‑Tree stores index entries; leaf nodes store the index and a pointer to the data in the .MYD file. Note that secondary indexes are non‑clustered, so their leaf nodes may store the primary key.
2.2 Classification by Primary Key Status
2.2.1 Primary Key Index
A table's primary key uses a primary key index, which is a clustered index.
If InnoDB is used and no explicit primary key is defined, InnoDB first checks for a unique index to use as the primary key; if none exists, it automatically creates a 6‑byte auto‑increment primary key.
Example: create table pl_ranking with column id as the primary key, then run the following query.
SELECT id, plname, ranking FROM pl_ranking WHERE id = 16;2.2.2 Secondary Index
A secondary index is a non‑clustered index whose leaf nodes store the primary key, allowing the index to locate the row via the primary key.
Secondary Index Categories 1) Unique Index : Guarantees uniqueness of the indexed column (NULLs allowed); multiple unique indexes can exist per table. Primarily used to enforce data uniqueness rather than performance. 2) Normal Index : Sole purpose is to speed up queries; multiple normal indexes can exist, and they allow duplicate and NULL values. 3) Prefix Index : Applies to string columns; indexes only the first few characters, reducing index size compared to a full column index. 4) Full‑Text Index : Designed for searching keywords within large text fields; commonly used by search engines.
Example: create table pl_ranking with column plname as a secondary index, then run the query below.
SELECT id, plname, ranking FROM pl_ranking WHERE plname = 'Java';Architect's Must-Have
Professional architects sharing high‑quality architecture insights. Covers high‑availability, high‑performance, high‑stability designs, big data, machine learning, Java, system, distributed and AI architectures, plus internet‑driven architectural adjustments and large‑scale practice. Open to idea‑driven, sharing architects for exchange and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
