Databases 37 min read

Master MySQL Query Optimization: Execution Flow, Caching, and Index Strategies

This article explains MySQL's logical architecture, query processing steps, and the inner workings of the optimizer, then provides practical performance‑tuning advice covering data types, index design, cache usage, and specific query patterns such as COUNT, JOIN, LIMIT pagination and UNION.

MaGe Linux Operations

Jan 26, 2019

Master MySQL Query Optimization: Execution Flow, Caching, and Index Strategies

MySQL Logical Architecture

MySQL consists of three logical layers. The top client layer handles connection, authentication and security. The middle service layer performs parsing, analysis, optimization, caching and built‑in functions, as well as cross‑engine features like stored procedures, triggers and views. The bottom storage‑engine layer stores and retrieves data, with a unified API that hides engine differences.

MySQL Query Process

When a client sends a query, MySQL follows a six‑step pipeline: client‑server protocol, optional query‑cache lookup, SQL parsing and preprocessing, optimizer generating an execution plan, execution engine invoking storage‑engine handlers, and finally returning results (and possibly caching them).

Client/Server Communication Protocol

MySQL uses a half‑duplex protocol: at any moment only one side sends data. Large queries require increasing max_allowed_packet; the server may reject overly large packets. Responses are sent in multiple packets that the client must read completely, which is why keeping queries simple and limiting result size (e.g., avoiding SELECT * and using LIMIT) is recommended.

Query Cache

Before parsing, MySQL checks the query cache. If a cache hit occurs, the result is returned after a permission check, bypassing parsing and execution. The cache is a hash table keyed by the query text, database, and protocol version; any difference (including whitespace) prevents a hit. Queries involving user‑defined functions, temporary tables, or system tables are never cached. Cache entries become invalid when any referenced table changes, which adds overhead on writes and sometimes on reads.

Every query is checked for cache eligibility, even if it will never hit.

If a query is cacheable, storing the result incurs additional CPU and memory cost.

Syntax Parsing and Preprocessing

The parser tokenizes the SQL, builds a parse tree, and validates syntax. Preprocessing verifies object existence (tables, columns) and other MySQL‑specific rules.

Query Optimization

The optimizer transforms the parse tree into an execution plan, using a cost‑based model to choose the lowest‑cost plan. The session variable last_query_cost shows the estimated cost (e.g., number of page reads).

mysql> select * from t_message limit 10;
...省略结果集...
mysql> show status like 'last_query_cost';
+-----------------+-------------+
| Variable_name   | Value       |
+-----------------+-------------+
| Last_query_cost| 6391.799000 |
+-----------------+-------------+

Common optimizer strategies include reordering joins, optimizing MIN/MAX, early termination with LIMIT, and efficient sorting (single‑pass sort in newer versions).

Execution Engine

After optimization, the execution engine follows the plan, invoking storage‑engine handler APIs for each table. Handlers expose metadata such as column names and index statistics.

Returning Results

The engine streams rows back to the client packet‑by‑packet, allowing the client to start processing before the entire result set is materialized. If the query cache is enabled and applicable, the result is also stored.

Client sends query.

Server checks cache; on hit returns cached result.

Server parses, preprocesses, and optimizes.

Server executes plan via storage‑engine APIs.

Result rows are streamed to client and optionally cached.

Performance Optimization Recommendations

Schema Design & Data Types

Prefer NOT NULL for indexed columns.

Integer width (e.g., INT(11)) has no effect.

Use UNSIGNED to double positive range.

Avoid DECIMAL when BIGINT can store scaled integers.

TIMESTAMP uses 4 bytes, DATETIME 8 bytes; TIMESTAMP is timezone‑dependent.

Minimize column count to reduce row‑buffer copying.

Large ALTER TABLE operations rebuild the table; consider pt‑online‑schema‑change or similar tools.

High‑Performance Index Creation

Indexes speed up lookups but excessive indexes increase disk and memory usage. Understanding B‑Tree structures helps design efficient indexes.

Index Data Structures & Algorithms

MySQL primarily uses B+Tree indexes (InnoDB) where leaf pages store the actual row pointers and internal pages store only keys. B+Tree nodes are sized to match the storage engine's page size, minimizing I/O (one page read per node). The tree height is typically ≤ 3 for large tables because the fan‑out (M) is large.

Leaf pages store sorted keys; internal pages guide the search. When a leaf overflows, it splits; when internal nodes overflow, they split upward. Rotations (similar to AVL trees) can reduce splits.

High‑Performance Strategies

Example table creation and composite index illustration:

CREATE TABLE People(
    last_name VARCHAR(50) NOT NULL,
    first_name VARCHAR(50) NOT NULL,
    dob DATE NOT NULL,
    gender ENUM('m','f') NOT NULL,
    KEY(last_name,first_name,dob)
);

The index orders rows by last_name, then first_name, then dob (the “leftmost‑prefix” rule).

When Indexes Are Not Used

Expressions or functions on indexed columns (e.g., WHERE id+1=5) prevent index usage.

select * from t where id + 1 = 5;

Prefix Indexes

Index only the leading characters of long string columns to save space.

Multi‑Column Index Order

Place the most selective column first. Use EXPLAIN to compare selectivity:

SELECT count(distinct staff_id)/count(*) as staff_id_selectivity,
       count(distinct customer_id)/count(*) as customer_id_selectivity
FROM payment;

Avoid Multiple Range Conditions

MySQL can use at most one range condition per query; multiple ranges (e.g., on login_time and age) force a choice.

select * from user where login_time > '2017-04-01' and age between 18 and 30;

Covering Indexes

All needed columns are present in the index, eliminating the need to read the table.

Index entries are smaller than full rows, reducing I/O.

Using Index Scan for ORDER BY

If the index column order matches the ORDER BY clause, MySQL can produce sorted results without an extra sort step.

select staff_id,customer_id from demo where date='2015-06-01' order by staff_id,customer_id;

Redundant & Duplicate Indexes

Avoid creating the same index twice (e.g., (A,B) and (A)). Remove redundant indexes unless a specific workload justifies them.

Delete Unused Indexes

Periodically drop indexes that are never used.

Specific Query Optimizations

Optimizing COUNT()

COUNT(*)

counts rows efficiently; counting a specific column excludes NULL values. For large tables, consider approximate counts via EXPLAIN or maintain a summary table.

Optimizing JOINs

Ensure the columns used in ON / USING have indexes, preferably on the second table of the join order.

Keep GROUP BY and ORDER BY expressions limited to a single table to allow index usage.

Optimizing LIMIT Pagination

Large offsets ( LIMIT 10000,20) are costly. Use covering indexes or “keyset pagination” (e.g., WHERE id > last_id LIMIT 20) instead.

SELECT id FROM t WHERE id > 10000 LIMIT 10;

Optimizing UNION

MySQL builds a temporary table for UNION. Use UNION ALL when duplicate elimination is not required, and push down WHERE, LIMIT, and ORDER BY into each subquery.

Conclusion

Understanding MySQL’s execution pipeline and the cost of each phase, combined with solid indexing and schema design, enables developers to make informed performance decisions. The principles and examples above should help bridge theory and practice.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

index design Query Optimization Performance Tuning MySQL cost-based optimizer Query Cache

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.