Databases 18 min read

Mastering MySQL Indexes: Advanced Concepts and Practical Tips

This article provides a comprehensive deep‑dive into MySQL indexing, covering the definition, pros and cons, differences between clustered and non‑clustered indexes, InnoDB vs MyISAM implementations, B+Tree structure, capacity calculations, adaptive hash indexes, index pushdown optimization, and how to interpret EXPLAIN output to verify index usage.

Shepherd Advanced Notes
Shepherd Advanced Notes
Shepherd Advanced Notes
Mastering MySQL Indexes: Advanced Concepts and Practical Tips

Database Index Overview

1. What is a database index? Advantages and disadvantages

An index is an ordered on‑disk data structure that speeds up query lookup, analogous to a book's table of contents. It occupies physical storage.

Accelerates data retrieval, the primary reason for creating indexes to resolve slow SQL.

Unique indexes enforce data uniqueness, supporting idempotent business logic.

Creating and maintaining indexes adds overhead; INSERT, UPDATE, DELETE operations must keep the index up‑to‑date.

Indexes consume disk space; excessive indexes increase storage consumption.

2. Clustered vs. non‑clustered indexes

In MySQL InnoDB, a clustered index stores the full row in leaf nodes, while a non‑clustered index stores only the primary‑key value in leaf nodes, requiring a back‑table lookup. A covering index contains all columns needed by the SELECT, eliminating the back‑table lookup.

Only one clustered index per table (typically the primary key).

Multiple non‑clustered indexes can exist on a table.

Clustered indexes generally provide higher query efficiency because they avoid back‑table lookups.

3. Differences between InnoDB and MyISAM index implementations

MyISAM indexes are all non‑clustered; InnoDB contains one clustered index.

InnoDB stores data and index together in a single .ibd file; MyISAM separates data ( .MYD), index ( .MYI) and table description ( *.sdi) files.

InnoDB non‑clustered index leaf nodes store the primary‑key value; MyISAM leaf nodes store the physical file offset of the record.

MyISAM back‑table lookups are fast because they use direct file offsets, whereas InnoDB must retrieve the primary key first and then locate the row in the clustered index.

InnoDB requires a primary key. If none is defined, MySQL creates an implicit 6‑byte integer primary key.

4. B+Tree index implementation

Both InnoDB and MyISAM use B+Tree as the index structure. Example table:

CREATE TABLE index_demo(
  c1 INT,
  c2 INT,
  c3 CHAR(1),
  PRIMARY KEY(c1)
);

Row format includes: record_type: 0 = ordinary record, 1 = B+Tree internal node, 2 = minimum record, 3 = maximum record. next_record: relative position of the next record.

Values of columns c1, c2, c3.

Hidden columns and extra information (omitted in the illustration).

Records are placed into pages (disk blocks). InnoDB’s default page size is 16 KB, so a table may span many pages. Two rules enable fast location:

The first user record in a page must have a primary‑key value greater than the last record of the previous page.

A directory entry is created for each page.

Locating the record with primary key 20 proceeds as:

Binary‑search the directory entries; key 20 belongs to directory entry 3, which points to page 9 (because 12 ≤ 20 < 209).

Binary‑search within page 9 to find the record with key 20.

This two‑level directory is the index. Higher‑level directory pages store only the minimal and maximal keys of their child pages, forming a multi‑level B+Tree.

5. Clustered index B+Tree characteristics

5.1 Clustered index

Index and data share the same B+Tree.

Records within a page are ordered by primary key, forming a single‑direction linked list.

Pages are linked by primary‑key order, forming a double‑direction linked list.

Non‑leaf nodes store primary‑key + page‑number.

Leaf nodes store the complete user record.

Advantages :

Faster data access because index and data reside in the same B+Tree.

Very fast sorted and range queries on the primary key.

Rows are stored contiguously, reducing I/O and saving many disk reads.

Disadvantages :

Insert speed depends on insert order; out‑of‑order inserts cause page splits and degrade performance. An auto‑increment primary key mitigates this.

Updating the primary key is costly because the row must be moved; primary keys are usually immutable.

Limitations :

Only InnoDB supports clustered indexes; MyISAM does not.

Each MySQL table can have only one clustered index because the physical storage order can be defined only once.

If a table lacks an explicit primary key, InnoDB chooses a non‑null unique column or creates an implicit 6‑byte integer primary key.

To fully exploit clustering, the primary key should be an ordered identifier (avoid UUID, MD5, hash, or random strings).

5.2 Non‑clustered index (secondary index)

When the search condition uses a column other than the primary key, a non‑clustered index must be created. Example: using column c2 as the search key builds a B+Tree on c2:

Records inside a page are ordered by c2 and form a single‑direction linked list.

Pages are linked by c2 order, forming a double‑direction linked list.

Non‑leaf nodes store c2 + page‑number.

Leaf nodes store only c2 and the primary key, not the full user record.

6. Approximate capacity of a B+Tree

Assuming a 16 KB page and ignoring pointer/key overhead:

1‑level tree (only leaf nodes): up to 16 records.

2‑level tree: up to 16 × 1600 = 25 600 records.

3‑level tree: up to 16 × 1600 × 1600 = 40 960 000 records.

Thus a three‑level tree comfortably handles tens of millions of rows.

Non‑leaf nodes store only directory entries, making B+Tree nodes more compact than B‑Tree nodes, resulting in a shallower tree and fewer I/O operations.

7. Adaptive hash index and index pushdown

Adaptive hash index is an InnoDB feature that automatically creates an in‑memory hash index on frequently accessed B‑Tree values, giving the B‑Tree some of the speed of a hash lookup. This behavior is internal and cannot be configured by the user.

Index pushdown (introduced in MySQL 5.6) evaluates additional conditions while traversing the index. Example:

SELECT * FROM user WHERE name LIKE '小%' AND age = 18;

If name and age are covered by a composite index idx_name_age, MySQL can apply the age filter during the index scan, reducing the number of back‑table lookups.

8. Determining whether a statement uses an index

Run EXPLAIN to view the execution plan. Key columns to examine:

type : join type ordered from best to worst –

system > const > eq_ref > ref > ref_or_null > index_merge > unique_subquery > index_subquery > range > index > ALL

.

rows : estimated number of rows MySQL expects to read.

filtered : percentage of rows that satisfy the filter conditions.

extra : additional information such as Using filesort, Using index, Using temporary, Using where, Using index condition (index pushdown).

key : the actual index used; compare with possible_keys to see if the optimizer chose the expected index.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

InnoDBMySQLIndexesB+TreeAdaptive Hash IndexClustered IndexNon-Clustered IndexIndex Pushdown
Shepherd Advanced Notes
Written by

Shepherd Advanced Notes

Dedicated to sharing advanced Java technical insights, daily work snippets, and the power of persistent effort.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.