Databases 17 min read

Deep Dive into MySQL Indexes: Structures, Types, and Optimization Techniques

This article explains MySQL index fundamentals—including B‑tree, B+‑tree, hash, and ordered array structures—covers primary versus secondary indexes, describes index maintenance such as page splits and merges, examines the InnoDB change buffer, and discusses optimizer choices and cases where indexes become ineffective.

Shepherd Advanced Notes

Mar 19, 2022

Deep Dive into MySQL Indexes: Structures, Types, and Optimization Techniques

1. Index fundamentals

Index is a data structure that enables fast lookup of rows. Common structures used by MySQL are B‑tree, B+‑tree and hash.

Indexes act like a dictionary’s table of contents: they let the engine locate rows without scanning every page. Creating an index speeds up reads and can enforce uniqueness, but building and maintaining indexes consumes CPU, I/O and storage.

Underlying data structures

Hash table : stores key‑value pairs. A hash function maps a key to an array position; the value is placed there. Example in Java: HashMap. Collisions are resolved by chaining (linked list). Since JDK 1.8, long chains are converted to a red‑black tree. Hash indexes cannot serve ordered or range queries, making them unsuitable for most database workloads.

hash = hashfunc(key)
index = hash % array_size

Ordered array : provides excellent performance for equality and range queries, but inserting a new element requires shifting data, which is costly.

Binary search tree : left subtree values are smaller than the parent, right subtree values are larger. In degenerate cases it can become a linear list, degrading performance.

B‑tree is a multi‑way balanced search tree; B+‑tree is a variant whose leaf nodes are linked and store the actual row pointers.

Index types in InnoDB

Each logical index is stored as a B+‑tree. According to leaf‑node content, indexes are classified as:

Primary (clustered) index : leaf stores the whole row.

Secondary (non‑primary) index : leaf stores only the primary‑key value.

Querying via a secondary index requires an extra lookup: first locate the primary key in the secondary leaf, then follow it to the primary index to fetch the row. This extra step is called back‑lookup , so primary‑key lookups are generally faster.

Smaller primary‑key length reduces the size of secondary‑index leaves, saving space.

Index maintenance

When inserting a new value into a B+‑tree, the engine may need to split a full page. If the new key belongs at the end of a page, it is simply appended; otherwise a new page is allocated and part of the data is moved – a process called page split . Page splits lower space utilization by roughly 50 % and increase write cost.

When two adjacent pages become sparsely populated after deletions, they are merged, which is the inverse of a split.

Each index points to a data page; inside the page rows are stored in an ordered array searchable by binary search.

A covering index contains all columns required by a query, allowing the engine to satisfy the query directly from the index without a back‑lookup.

B+‑tree indexes obey the left‑most prefix rule : for a composite index the optimizer can use the leftmost N columns (or the leftmost M characters of a string column). Adjusting column order in a composite index can reduce the number of indexes that need to be maintained.

2. Unique vs. regular indexes

Unique indexes guarantee column uniqueness. When a matching row is found, the search stops because the result is unique. Regular indexes may continue scanning until the first non‑matching row is encountered.

InnoDB reads and writes data in 16 KB pages. Because the extra record check usually stays within the same page, the read‑performance difference between unique and regular indexes is negligible.

During updates InnoDB uses a change buffer for non‑unique indexes but not for unique indexes. The change buffer records modifications in memory and later merges them into the data pages. Its size is limited by the parameter innodb_change_buffer_max_size, which defaults to 50 % of the buffer pool.

Two update scenarios illustrate the impact:

The target page is already in memory: both unique and regular indexes update the row directly; the unique index performs an extra uniqueness check.

The target page is not in memory: for a unique index the engine must read the page, verify uniqueness, then update; for a regular index the modification is written to the change buffer and the statement finishes. The subsequent page read incurs a random I/O, one of the most expensive operations in a database.

Converting a regular index to a unique index can dramatically slow down bulk inserts, potentially blocking the whole system.

Change buffer is most beneficial for write‑heavy, read‑light workloads (e.g., billing or logging systems) where many updates accumulate on a page before it is read again. If writes are immediately followed by reads, the change buffer may trigger a merge right away, adding extra I/O and negating its benefit.

Compared with the redo log, which mainly reduces random writes by turning them into sequential writes, the change buffer primarily reduces random reads . Change‑buffer entries are also written to the redo log, so they survive a crash; the buffer is not lost.

3. Why the optimizer sometimes picks the wrong index

The optimizer seeks the lowest‑cost execution plan, considering factors such as estimated row count, temporary tables and sorting. Inaccurate statistics can mislead the optimizer; executing ANALYZE TABLE refreshes statistics. Developers can also use FORCE INDEX, rewrite queries, or add/drop indexes to guide the optimizer.

4. Situations where indexes become ineffective

The optimizer may abandon an index and perform a full scan if it estimates the scan to be cheaper. Common reasons include:

Applying functions or arithmetic on indexed columns, e.g.

SELECT * FROM `user` WHERE MONTH(create_time)=2;
SELECT * FROM `user` WHERE age+1=20;

Implicit type conversion, such as comparing a VARCHAR column to a numeric literal, which forces a CAST and prevents index usage, e.g.

SELECT * FROM `user` WHERE user_no=110717; -- actually CAST(user_no AS SIGNED) = 110717

Implicit character‑set conversion in joins (e.g., joining a UTF8 table with a UTF8MB4 table). The conversion adds a function call on the join column, causing the optimizer to ignore the index.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Query Optimization InnoDB MySQL Indexes Hash B+Tree Change Buffer

Written by

Shepherd Advanced Notes

Dedicated to sharing advanced Java technical insights, daily work snippets, and the power of persistent effort.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.