Databases 14 min read

MySQL Index Optimization: Principles, Types, and Practical Case Studies

This article explains MySQL index fundamentals, different index structures, the concept of covering indexes and the left‑most prefix rule, and presents three real‑world optimization cases that illustrate how to design effective composite and partial indexes to avoid table scans and improve query performance.

Full-Stack Internet Architecture
Full-Stack Internet Architecture
Full-Stack Internet Architecture
MySQL Index Optimization: Principles, Types, and Practical Case Studies

Hello everyone, I'm Li Du. After a painful code‑review experience with a newcomer, I decided to record and share some MySQL index optimization knowledge.

MySQL Index

MySQL indexes are data structures that accelerate query speed, similar to a book's table of contents.

Indexes are implemented using B+ trees. Each data page is 16 KB; the engine loads pages from disk into memory as needed.

Other structures such as hash indexes provide O(1) lookups for equality queries but are unordered and unsuitable for range queries. Ordered arrays have good query performance but are costly to maintain on inserts. Binary trees suffer from high tree height and random I/O.

B+ trees are multi‑way trees; with a 16 KB page height of 1–3, they can store billions of rows, requiring only 1–3 disk accesses, and leaf nodes are linked for efficient range scans.

Index Types

From a data‑structure perspective: B+ tree, hash, R‑Tree, FULLTEXT .

From a storage perspective: clustered (primary) index and non‑clustered index .

From a logical perspective: primary key, ordinary, unique, composite, and spatial indexes .

What Is a “Covering Scan” (回表) and How to Avoid It

In InnoDB, the primary key index stores the full row in leaf nodes (clustered index). Secondary indexes store the indexed columns plus the primary key value.

When a query needs columns that are not present in the secondary index, the engine must look up the primary key index – this extra lookup is called a “covering scan” (回表). Reducing or eliminating covering scans improves performance.

Solving Covering Scans

Creating a composite index that includes all columns required by the query makes the index “cover” the query, eliminating the need for a covering scan. Example:

create table user (
  id int primary key,
  name varchar(20),
  sex varchar(5),
  index(name, sex)
) engine = innodb;

Now the secondary index leaf nodes contain both name and sex, so the query can be satisfied without a covering scan.

Left‑Most Prefix Rule

The left‑most prefix rule states that a composite index can be used for queries that filter on the leftmost N columns of the index, or on the leftmost M characters of a string column.

Example: a composite index on (col3, col2) allows fast lookup for WHERE col3 LIKE 'Eri%', but a pattern like WHERE col3 LIKE '%se' defeats the index and forces a full table scan.

Optimization Case 1

A newcomer wrote a query with WHERE (STATUS='1' AND shop_code='XXX') GROUP BY act_id and created an index idx_status_shop_code. The index placed the low‑selectivity status column first, which is inefficient.

Recommendation: put the high‑selectivity shop_code first and include act_id and store_code to avoid both covering scans and filesort:

CREATE INDEX idx_shop_code_status_act_id ON table_name (shop_code, status, act_id, store_code);

Optimization Case 2

For a table storing encrypted phone numbers (MD5), the newcomer created a full‑field index on (mobile_md5, type). By checking distinctness of the first five characters of the MD5 hash, we found that a prefix of length 5 already provides ~80% selectivity, so a prefix index on the first five characters saves space while retaining performance.

Optimization Case 3

When a userId column contains 20‑character strings where the first ~10 characters are identical, the high‑selectivity part is at the end. Creating an index on the reversed string’s leading characters (e.g., LEFT(REVERSE(userId), 7)) captures the discriminative part efficiently.

Additionally, for a query

SELECT u.city_code FROM XXX.city_role_user u WHERE role_key='XXX' AND uc_id='XXX' AND status=1

, the low‑selectivity role_key should not be placed first in the index; instead, put uc_id (high selectivity) first and include city_code to avoid covering scans.

Additional Code Review Observations

Common issues found during code review include: violating the single‑responsibility principle, using SELECT * unnecessarily, and not applying appropriate design patterns (Strategy, Builder). Overuse of Java 8 streams can hurt readability; however, parallel streams can improve performance for large data sets, as demonstrated by the following demo:

public class StreamParallelDemo {
    public static void main(String[] args) {
        System.out.println(String.format("CPU cores: %d", Runtime.getRuntime().availableProcessors()));
        Random random = new Random();
        List<Integer> list = new ArrayList<>(1000_0000);
        for (int i = 0; i < 1000_0000; i++) {
            list.add(random.nextInt(100));
        }
        long prev = getCurrentTime();
        list.stream().reduce((a, b) -> a + b).ifPresent(System.out::println);
        System.out.println(String.format("Single‑thread time: %d", getCurrentTime() - prev));
        prev = getCurrentTime();
        list.stream().parallel().reduce((a, b) -> a + b).ifPresent(System.out::println);
        System.out.println(String.format("Parallel time: %d", getCurrentTime() - prev));
    }
    private static long getCurrentTime() {
        return System.currentTimeMillis();
    }
}

Overall, the article records practical MySQL index tuning tips, emphasizing three key points: place high‑selectivity columns first, use prefix indexes for long strings, and avoid covering scans.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLmysqlDatabase designIndex Optimizationquery-performance
Full-Stack Internet Architecture
Written by

Full-Stack Internet Architecture

Introducing full-stack Internet architecture technologies centered on Java

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.