Databases 15 min read

MySQL Index Optimization: Fundamentals, Types, and Practical Cases

This article explains MySQL index fundamentals, B+‑tree implementation, the left‑most prefix rule, back‑to‑table issues, and provides three practical optimization cases with code examples to help developers design efficient composite indexes and avoid performance pitfalls.

Code Ape Tech Column

Sep 26, 2021

During a recent code review for a newcomer, I noticed several index‑related problems and decided to document the knowledge for the team.

MySQL Index

MySQL indexes are data structures that accelerate query speed, similar to a book's table of contents.

Internally, MySQL uses a B+‑tree implementation. Each data page is 16 KB; loading from disk occurs page‑by‑page, and queries only hit the disk when the required page is not already in memory.

Other index structures such as hash exist, but hash is unsuitable for range queries because it stores unordered key‑value pairs and can suffer from collisions.

Ordered arrays provide excellent performance for both equality and range queries, yet they are impractical for indexes because inserting a new row would require shifting many entries.

Binary trees also have drawbacks: they have limited fan‑out, leading to deep trees and excessive random I/O for large datasets.

B+‑trees are multi‑way trees; with a 16 KB page they can store billions of rows within 1–3 levels, meaning only 1–3 disk accesses are needed. Leaf nodes are linked, which makes range scans efficient.

Types of Indexes

From a data‑structure perspective: B+‑tree index, hash index, R‑Tree index, FULLTEXT index .

From a physical storage perspective: clustered index and non‑clustered index .

From a logical perspective: primary key index, ordinary index, unique index, composite (joint) index, and spatial index .

What Is "Back to Table" (回表)

Before understanding back‑to‑table, it helps to know InnoDB’s storage format. InnoDB’s primary‑key index (clustered) stores the full row in leaf nodes. Secondary indexes store the indexed columns plus the primary‑key value.

Because secondary index leaf nodes contain only the indexed columns, a query that needs additional columns must look up the primary‑key index to fetch the full row—this extra lookup is called "back to table".

Since each back‑to‑table operation incurs another index search, it should be minimized.

Solving Back‑to‑Table

Creating a covering composite index eliminates the need for a back‑to‑table lookup. For example, querying name and sex can be covered by an index on (name, sex).

create table user (
  id int primary key,
  name varchar(20),
  sex varchar(5),
  index(name, sex)
) engine = innodb;

With this index, the leaf nodes contain both name and sex, so the query can be satisfied without accessing the primary‑key index.

Left‑most Prefix Principle

The left‑most prefix rule states that a composite index can be used for queries that filter on the leftmost N columns, or for string indexes on the leftmost M characters.

Given a table with columns col3 and col2, a composite index on ( col3, col2) sorts leaf nodes first by col3, then by col2. A query like WHERE col3 LIKE 'Eri%' can quickly locate rows starting with "Eric".

Conversely, a pattern such as WHERE col3 LIKE '%se' cannot use the index because the leading characters are unknown, leading to a full‑table scan.

Optimization Case 1

A newcomer wrote a query with WHERE (STATUS = '1' AND shop_code = 'XXX') GROUP BY act_id and created an index idx_status_shop_code. The index placed the low‑selectivity status column first, which is inefficient.

Recommendation: store a special placeholder (e.g., 00000) instead of an empty string for shop_code, and create an index idx_shop_code_status_act_id with shop_code first because it has higher cardinality.

Adding act_id and store_code to the index also avoids a filesort, as the index itself maintains order.

Since the query only returns act_id, including it in the index eliminates back‑to‑table lookups.

Optimization Case 2

The second case involves a column storing MD5‑hashed phone numbers. The newcomer created a composite index idx_mobile_md5_type (mobile_md5, type).

I suggested measuring the distinct‑value ratio with:

select count(distinct left(mobile_md5, 5))/count(*) from XXX.users;

The result showed that the first five characters already provide about 80 % selectivity, so indexing only the prefix saves space without sacrificing performance.

Optimization Case 3

The third case deals with a 20‑character userId similar to an ID number where the first ~10 characters are identical and the trailing characters provide high selectivity.

By reversing the string and indexing the leftmost 7 characters, we achieve high discrimination while reducing index size:

select count(distinct left(REVERSE(userId), 7))/count(*) as 'selectivity' from XXX.users;

Additionally, the query

select u.city_code from XXX.city_role_user u where role_key = 'XXX' and uc_id = 'XXX' and status = 1;

places the low‑cardinality role_key first; swapping the order to put uc_id (high cardinality) first and adding it to a composite index eliminates both back‑to‑table and filesort operations.

Key take‑aways: put high‑selectivity columns first, shorten string indexes, and avoid back‑to‑table lookups.

Other Code Review Findings

Additional issues observed include violating the Single‑Responsibility Principle by using generic interfaces that return SELECT *, leading to unnecessary column reads and back‑to‑table overhead.

Design patterns such as Strategy and Builder were not applied, reducing code extensibility.

Heavy use of Java 8 streams was noted; while streams can be parallelized for performance, they may hurt readability. An example of a parallel stream demo:

public class StreamParallelDemo {
    public static void main(String[] args) {
        System.out.println(String.format("本计算机的核数：%d", Runtime.getRuntime().availableProcessors()));

        // 产生100w个随机数(1 ~ 100)，组成列表
        Random random = new Random();
        List<Integer> list = new ArrayList<>(1000_0000);

        for (int i = 0; i < 1000_0000; i++) {
            list.add(random.nextInt(100));
        }

        long prevTime = getCurrentTime();
        list.stream().reduce((a, b) -> a + b).ifPresent(System.out::println);
        System.out.println(String.format("单线程计算耗时：%d", getCurrentTime() - prevTime));

        prevTime = getCurrentTime();
        list.stream().parallel().reduce((a, b) -> a + b).ifPresent(System.out::println);
        System.out.println(String.format("多线程计算耗时：%d", getCurrentTime() - prevTime));
    }

    private static long getCurrentTime() {
        return System.currentTimeMillis();
    }
}

Overall, the review highlighted many basic but important practices for writing efficient, maintainable database‑driven code.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL Database Index Optimization

Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.