Understanding Database Indexes: Structures, Types, and Trade‑offs
This article explains the fundamentals of database indexes, covering their purpose, underlying structures such as hash tables, B‑Tree and B+‑Tree, the advantages and drawbacks, and the various index types in MySQL including clustered, non‑clustered, primary, secondary, unique, prefix, and full‑text indexes.
1. Basics of Indexes
1.1 What is an Index
Index is a data structure for fast query and retrieval, such as B‑Tree, B+‑Tree, and hash table. It works like a catalog, allowing quick location of entries.
Advantages : Indexes can greatly accelerate query speed; uniqueness ensures each row is unique. Disadvantages : Creating and maintaining indexes consumes time; updates require index modifications, reducing SQL performance; indexes occupy physical files and space.
1.2 Underlying Data Structures
1.2.1 Hash Table
Hash tables use a hash algorithm to map a key to a value quickly.
hash = hashfunc(key)
index = hash % array_sizeHash collisions occur when different keys map to the same index. Common solutions include open addressing, rehashing, and chaining. JDK 1.8 HashMap switches to a red‑black tree when a bucket’s chain length exceeds 8.
Why MySQL does not use hash tables for indexes
The main reason is that hash tables do not support ordered or range queries.
// Example of a range query
SELECT * FROM tb1 WHERE id < 500
// Hash would need to hash every row, which cannot efficiently satisfy id < 5001.2.3 B‑Tree and B+‑Tree
B‑Tree is a multi‑way balanced search tree; B+‑Tree is a variant. MyISAM and InnoDB use B+‑Tree for indexes. From the primary‑key perspective, indexes are primary (clustered) or secondary (non‑clustered). From storage perspective, indexes are clustered or non‑clustered; MyISAM uses non‑clustered, InnoDB uses clustered.
Differences between B‑Tree and B+‑Tree: ① B‑Tree nodes store both keys and values; B+‑Tree stores keys and values only in leaf nodes, internal nodes store only keys. ② B‑Tree leaf nodes are independent; B+‑Tree leaf nodes are linked together. ③ B‑Tree search may stop before reaching leaf nodes; B+‑Tree always traverses from root to leaf, providing stable query efficiency.
2. Types and Principles of Indexes
2.1 Classification by Data Separation
2.1.1 Clustered Index
Clustered index stores the index structure together with the data. Primary key indexes are clustered.
InnoDB’s .ibd file contains both index (B+‑Tree) and data; non‑leaf nodes store index, leaf nodes store index and corresponding data.
2.1.2 Non‑clustered Index
Non‑clustered index stores the index structure separately from the data.
MYISAM’s .MYI file holds only the index (B+‑Tree). Leaf nodes store index and pointers to data in the .MYD file. Secondary indexes are non‑clustered, so leaf nodes may store primary keys.
2.2 Classification by Primary Key
2.2.1 Primary Key Index
The table’s primary key uses a primary key index, which is clustered.
If InnoDB is used and no explicit primary key is defined, InnoDB first looks for a unique column to use as the primary key; otherwise it creates a hidden 6‑byte auto‑increment key.
Example: create table pl_ranking with id as primary key, then query:
select id, plname, ranking from pl_ranking where id=16;2.2.2 Secondary Index
Secondary index is a non‑clustered index; its leaf nodes store the primary key, allowing lookup of the row.
Secondary index categories: ① Unique index – enforces uniqueness but allows NULL; a table can have multiple unique indexes. ② Normal index – improves query speed; duplicates and NULL are allowed. ③ Prefix index – suitable for string columns; indexes only the leading characters, reducing size. ④ Full‑text index – used to search keywords in large text fields, common in search engines.
Example: create table pl_ranking with plname as a secondary index, then query:
select id, plname, ranking from pl_ranking where plname='Java';Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect's Must-Have
Professional architects sharing high‑quality architecture insights. Covers high‑availability, high‑performance, high‑stability designs, big data, machine learning, Java, system, distributed and AI architectures, plus internet‑driven architectural adjustments and large‑scale practice. Open to idea‑driven, sharing architects for exchange and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
