Why Indexes Matter: Understanding B+Tree, Hash, and Clustered Indexes in MySQL
This article explains what database indexes are, classifies them by data structure, storage engine, field characteristics and column count, compares B+Tree with B‑Tree, hash and red‑black trees, and demonstrates how InnoDB and MyISAM handle clustered and secondary indexes with practical SQL examples.
What is an Index?
Index is a data structure that helps the storage engine retrieve data efficiently.
Many describe an index as a directory of data, allowing the storage engine to quickly locate data.
Index Classification
Indexes can be classified from several perspectives.
From data‑structure perspective:
B+tree
Hash
Full‑text index
From physical‑storage perspective:
Clustered index
Secondary (auxiliary) index
From index‑field characteristics:
Primary‑key index
Unique index
Normal index
Prefix index
From the number of indexed columns:
Single‑column index
Composite (multi‑column) index
Data‑Structure View of Indexes
MySQL storage engines support different index types. InnoDB, MyISAM and Memory all support B+tree; only InnoDB and Memory support Hash; InnoDB and MyISAM support Full‑text indexes.
B+tree and B‑tree
1970, R. Bayer and E. McCreight proposed the B‑tree, a balanced multi‑way tree widely used for disk‑based indexing.
B+tree is a variant of B‑tree. B+tree stores data only in leaf nodes, while B‑tree also stores data in internal nodes.
Because leaf nodes contain only keys, a B+tree node holds fewer entries, allowing more keys per disk I/O and reducing tree height.
Leaf nodes are linked in a singly‑linked list, which makes range scans efficient—something B‑tree cannot do.
B+tree and Red‑Black Tree
For a B+tree with N leaf nodes, the search complexity is O(log d N), where d (the degree) is usually >100, so the tree height stays around 3‑4 even for millions of rows, requiring only 3‑4 disk I/Os.
A red‑black tree is a binary tree (degree 2) with search complexity O(log N), which typically incurs more disk I/Os than a B+tree.
B+tree Index vs Hash Table
Range queries are common in MySQL; hash tables are suitable only for equality queries and suffer from hash‑function selection and collisions.
Therefore B+tree indexes have a broader applicability than hash indexes.
Physical‑Storage View of Indexes
Different MySQL storage engines handle indexes differently.
InnoDB Indexes
InnoDB indexes are divided into clustered indexes and secondary indexes based on whether leaf nodes store the full row data.
The full table data is stored in the clustered index.
All other indexes are secondary indexes.
create table workers (
id int(11) not null auto_increment comment '员工工号',
name varchar(16) not null comment '员工名字',
sales int(11) default null comment '员工销售业绩',
primary key (id)
) engine=InnoDB AUTO_INCREMENT=10 default charset=utf8;Insert sample rows into the table.
insert into workers(id, name, sales) values (1, '江南', 12744);
insert into workers(id, name, sales) values (3, '今何在', 14082);
insert into workers(id, name, sales) values (7, '路明非', 14738);
insert into workers(id, name, sales) values (8, '吕归尘', 7087);
insert into workers(id, name, sales) values (11, '姬野', 8565);
insert into workers(id, name, sales) values (15, '凯撒', 8501);
insert into workers(id, name, sales) values (20, '绘梨衣', 7890);The clustered index of this table is built on the primary‑key column id.
Secondary Index Example
Create a secondary index on the name column.
alter table workers add index index_name(name);The leaf nodes of this secondary index store the primary‑key values ( id) rather than full rows.
Covering Index and “Back‑Table” Lookup
When a query can be satisfied using only the columns stored in a secondary index, MySQL can avoid the extra lookup to the clustered index; this is called a covering index. select * from workers where name='吕归尘'; This query uses the secondary index to find id=8, then looks up the full row in the clustered index.
select id, name from workers where name='吕归尘';Because the required columns ( id and name) are present in the secondary index, the query is covered and no back‑table lookup occurs.
explain select id, name from workers where name='吕归尘';The execution plan shows Using index and Using where, indicating an index‑only scan.
MyISAM Indexes
MyISAM tables do not have clustered indexes; leaf nodes store pointers to the row data.
Both primary and non‑primary indexes have the same structure, storing row addresses, so a MyISAM table can exist without a primary key.
Index Field Characteristics
Primary‑Key Index
Built on the primary‑key column.
Only one per table.
Column values cannot be NULL.
Usually created when the table is defined.
Unique Index
Built on columns declared UNIQUE.
A table can have multiple unique indexes; column values may be NULL.
Normal Index
An index on a regular column that is neither primary nor unique.
Prefix Index
A prefix index is created on the first N characters (or bytes) of a character or binary column, reducing storage space and improving query speed.
create index index_name on persons(name(5)) comment '前缀索引';Index Column Count
Single‑column index: built on one column.
Composite (multi‑column) index: built on multiple columns.
Example of a composite index on id and name:
create index index_id_name on workers(id, name) comment '组合索引';The composite index stores both id and name in the leaf nodes; the non‑leaf nodes contain combined keys for ordering.
Java Interview Crash Guide
Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
