Unlock MySQL Performance: Deep Dive into Query Optimization Principles
This article explains MySQL's logical architecture, query execution flow, and the inner workings of its optimizer, then provides practical performance‑tuning advice—including index design, query cache handling, and specific query‑type optimizations—to help developers make informed decisions for real‑world workloads.
MySQL Logical Architecture
If you can picture how MySQL components cooperate, you will better understand its server behavior. The diagram below shows MySQL's logical architecture.
The architecture has three layers. The top client layer handles connection processing, authentication, and security. The middle service layer provides query parsing, analysis, optimization, caching, built‑in functions, and cross‑engine features such as stored procedures, triggers, and views. The bottom storage‑engine layer manages data storage and retrieval, with a stable API that abstracts engine differences.
MySQL Query Process
Understanding how MySQL optimizes and executes queries reveals why many optimization tips simply guide the optimizer to a reasonable plan.
When a client sends a request, MySQL performs the steps illustrated below.
Client/Server Communication Protocol
The protocol is half‑duplex: at any moment only one side transmits data. The client sends the query in a single packet (large queries require increasing max_allowed_packet), and the server replies with one or more packets that the client must read completely before sending more data. Keeping queries simple and limiting result size reduces packet count, which is why avoiding SELECT * and using LIMIT are recommended.
Query Cache
Before parsing, MySQL checks the query cache. If a cache hit occurs, the result is returned after a permission check, bypassing parsing and execution plan generation. The cache is a hash table keyed by the query text, database, client protocol version, etc.; any difference (including whitespace) prevents a hit.
Queries that use user‑defined functions, stored functions, user variables, temporary tables, or system tables are never cached because their results vary.
Cache invalidation occurs when any table referenced by a cached query changes. This adds overhead on writes and, to a lesser extent, on reads. Therefore cache benefits appear only when the saved I/O outweighs the extra work of checking and maintaining the cache.
Every query is checked for cache eligibility, even if it will never hit.
If the result can be cached, storing it incurs additional system cost.
Open the cache only when testing shows a net gain; for write‑intensive workloads, set query_cache_type=DEMAND and mark cache‑eligible queries with SQL_CACHE.
Syntax Parsing and Pre‑processing
The parser tokenizes the SQL, builds a parse tree, and validates syntax. Pre‑processing then checks object existence (tables, columns) and other semantic rules.
Query Optimization
The optimizer transforms the parse tree into an execution plan, choosing the lowest‑cost plan based on statistics such as table and index page counts, index cardinality, and data distribution.
mysql> select * from t_message limit 10;
...省略结果集
mysql> show status like 'last_query_cost';
+-----------------+-------------+
| Variable_name | Value |
+-----------------+-------------+
| Last_query_cost | 6391.799000 |
+-----------------+-------------+The cost indicates an estimated 6391 random page reads for the example query. Inaccurate statistics, ignored user‑defined functions, or optimizer assumptions can lead to sub‑optimal plans.
Execution Engine
After optimization, the execution engine follows the plan, invoking storage‑engine handler APIs for each table. Handlers expose column metadata, index statistics, and row data.
Returning Results to the Client
The server streams result rows back to the client packet by packet. If the query cache is enabled and the query is cacheable, the result is also stored.
Performance‑Optimization Recommendations
Below are three groups of practical tips; always validate assumptions with real‑world testing.
Schema Design and Data Types
Prefer NOT NULL columns when you plan to index them.
Integer width specifications (e.g., INT(11)) have no effect; the storage size is fixed.
Use UNSIGNED to double the positive range when negative values are unnecessary.
Prefer BIGINT over DECIMAL for most numeric data to avoid costly decimal arithmetic.
TIMESTAMP uses 4 bytes and is limited to 1970‑2038; DATETIME uses 8 bytes and has a wider range.
Avoid unnecessary ENUM columns; altering them requires table rebuilds.
Keep the number of columns reasonable; excessive columns increase CPU usage during row decoding.
Large ALTER TABLE operations rebuild the table; consider online tools or partitioning for massive tables.
High‑Performance Index Creation
Indexes accelerate queries but consume disk and memory. Create indexes deliberately, understanding the underlying data structures.
Index Data Structures and Algorithms
Most MySQL indexes are B‑Tree (or B+Tree in InnoDB). A B+Tree stores only keys in leaf pages, with internal pages holding routing information. This reduces I/O because a single page read brings many keys.
Balanced trees keep height low; MySQL chooses a page size that matches the OS page size, allowing a whole node to be read with one I/O.
When inserting values, leaf pages split when full; internal pages split as needed. Rotations (left/right) can sometimes avoid splits.
High‑Performance Strategies
Example table creation:
CREATE TABLE People(
last_name varchar(50) NOT NULL,
first_name varchar(50) NOT NULL,
dob date NOT NULL,
gender enum('m','f') NOT NULL,
KEY(last_name,first_name,dob)
);The index stores the three columns in the order defined; this is the “leftmost prefix” rule.
When Indexes Are Not Used
Indexes cannot be used on expressions (e.g., WHERE id+1=5) or functions.
select * from t where id+1 = 5;Prefix Indexes
Index only the leading characters of long columns to save space.
Multi‑Column Indexes and Order
Separate single‑column indexes rarely outperform a combined multi‑column index because the optimizer can merge indexes only in limited cases. Place the most selective column first.
Index selectivity = distinct values / total rows; a value close to 1 is ideal.
Avoid Multiple Range Conditions
MySQL can use an index on only one range condition per query; additional range predicates cause full scans.
Covering Indexes
Scanning only the index reduces I/O because index entries are smaller than full rows.
Ordered index scans produce sorted results without extra sorting steps.
Using Index Scan for Sorting
If the ORDER BY column order matches the index order, MySQL can return rows already sorted.
SELECT staff_id,customer_id FROM demo WHERE date='2015-06-01' ORDER BY staff_id,customer_id;Redundant and Duplicate Indexes
Avoid creating identical indexes on the same column set; remove them promptly. Occasionally a duplicate may be justified for very large tables where a narrower index reduces I/O.
Dropping Unused Indexes
Periodically drop indexes that have not been used for a long time.
Specific Query Optimizations
Optimizing COUNT()
COUNT(*)counts rows efficiently; counting a specific column skips NULLs. For approximate counts, use EXPLAIN row estimates or maintain a summary table.
Optimizing Joins
Ensure the columns used in ON or USING have indexes, preferably on the second table of the join order.
Make sure GROUP BY and ORDER BY reference only one table to allow index usage.
MySQL executes joins as nested loops; the outer iterator fetches rows from the first table, the inner iterator looks up matching rows in the second table using an index.
outer_iterator = SELECT A.xx,A.c FROM A WHERE A.xx IN (5,6);
while (outer_row) {
inner_iterator = SELECT B.yy FROM B WHERE B.c = outer_row.c;
while (inner_row) {
output[inner_row.yy, outer_row.xx];
inner_row = inner_iterator.next();
}
outer_row = outer_iterator.next();
}Optimizing LIMIT Pagination
Large offsets cause MySQL to read and discard many rows. Use covering indexes or “bookmark” pagination (e.g., WHERE id > last_id LIMIT 10) to avoid the offset cost.
SELECT id FROM t WHERE id > 10000 LIMIT 10;Optimizing UNION
UNION builds a temporary table; use UNION ALL when duplicate removal is unnecessary, and push down WHERE, LIMIT, and ORDER BY into each subquery.
Conclusion
Understanding the stages of MySQL query execution and where time is spent, combined with the optimization techniques described, helps you apply theory to practice and achieve measurable performance gains.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
