Databases 38 min read

Unlock MySQL Performance: Deep Dive into Query Optimization Principles

This article explains MySQL's logical architecture, query execution flow, and the inner workings of its optimizer, then provides practical performance‑tuning advice—including index design, query cache handling, and specific query‑type optimizations—to help developers make informed decisions for real‑world workloads.

MaGe Linux Operations

May 21, 2018

Unlock MySQL Performance: Deep Dive into Query Optimization Principles

MySQL Logical Architecture

If you can picture how MySQL components cooperate, you will better understand its server behavior. The diagram below shows MySQL's logical architecture.

The architecture has three layers. The top client layer handles connection processing, authentication, and security. The middle service layer provides query parsing, analysis, optimization, caching, built‑in functions, and cross‑engine features such as stored procedures, triggers, and views. The bottom storage‑engine layer manages data storage and retrieval, with a stable API that abstracts engine differences.

MySQL Query Process

Understanding how MySQL optimizes and executes queries reveals why many optimization tips simply guide the optimizer to a reasonable plan.

When a client sends a request, MySQL performs the steps illustrated below.

Client/Server Communication Protocol

The protocol is half‑duplex: at any moment only one side transmits data. The client sends the query in a single packet (large queries require increasing max_allowed_packet), and the server replies with one or more packets that the client must read completely before sending more data. Keeping queries simple and limiting result size reduces packet count, which is why avoiding SELECT * and using LIMIT are recommended.

Query Cache

Before parsing, MySQL checks the query cache. If a cache hit occurs, the result is returned after a permission check, bypassing parsing and execution plan generation. The cache is a hash table keyed by the query text, database, client protocol version, etc.; any difference (including whitespace) prevents a hit.

Queries that use user‑defined functions, stored functions, user variables, temporary tables, or system tables are never cached because their results vary.

Cache invalidation occurs when any table referenced by a cached query changes. This adds overhead on writes and, to a lesser extent, on reads. Therefore cache benefits appear only when the saved I/O outweighs the extra work of checking and maintaining the cache.

Every query is checked for cache eligibility, even if it will never hit.

If the result can be cached, storing it incurs additional system cost.

Open the cache only when testing shows a net gain; for write‑intensive workloads, set query_cache_type=DEMAND and mark cache‑eligible queries with SQL_CACHE.

Syntax Parsing and Pre‑processing

The parser tokenizes the SQL, builds a parse tree, and validates syntax. Pre‑processing then checks object existence (tables, columns) and other semantic rules.

Query Optimization

The optimizer transforms the parse tree into an execution plan, choosing the lowest‑cost plan based on statistics such as table and index page counts, index cardinality, and data distribution.

mysql> select * from t_message limit 10;
...省略结果集
mysql> show status like 'last_query_cost';
+-----------------+-------------+
| Variable_name   | Value       |
+-----------------+-------------+
| Last_query_cost | 6391.799000 |
+-----------------+-------------+

The cost indicates an estimated 6391 random page reads for the example query. Inaccurate statistics, ignored user‑defined functions, or optimizer assumptions can lead to sub‑optimal plans.

Execution Engine

After optimization, the execution engine follows the plan, invoking storage‑engine handler APIs for each table. Handlers expose column metadata, index statistics, and row data.

Returning Results to the Client

The server streams result rows back to the client packet by packet. If the query cache is enabled and the query is cacheable, the result is also stored.

Performance‑Optimization Recommendations

Below are three groups of practical tips; always validate assumptions with real‑world testing.

Schema Design and Data Types

Prefer NOT NULL columns when you plan to index them.

Integer width specifications (e.g., INT(11)) have no effect; the storage size is fixed.

Use UNSIGNED to double the positive range when negative values are unnecessary.

Prefer BIGINT over DECIMAL for most numeric data to avoid costly decimal arithmetic.

TIMESTAMP uses 4 bytes and is limited to 1970‑2038; DATETIME uses 8 bytes and has a wider range.

Avoid unnecessary ENUM columns; altering them requires table rebuilds.

Keep the number of columns reasonable; excessive columns increase CPU usage during row decoding.

Large ALTER TABLE operations rebuild the table; consider online tools or partitioning for massive tables.

High‑Performance Index Creation

Indexes accelerate queries but consume disk and memory. Create indexes deliberately, understanding the underlying data structures.

Index Data Structures and Algorithms

Most MySQL indexes are B‑Tree (or B+Tree in InnoDB). A B+Tree stores only keys in leaf pages, with internal pages holding routing information. This reduces I/O because a single page read brings many keys.

Balanced trees keep height low; MySQL chooses a page size that matches the OS page size, allowing a whole node to be read with one I/O.

When inserting values, leaf pages split when full; internal pages split as needed. Rotations (left/right) can sometimes avoid splits.

High‑Performance Strategies

Example table creation:

CREATE TABLE People(
    last_name varchar(50) NOT NULL,
    first_name varchar(50) NOT NULL,
    dob date NOT NULL,
    gender enum('m','f') NOT NULL,
    KEY(last_name,first_name,dob)
);

The index stores the three columns in the order defined; this is the “leftmost prefix” rule.

When Indexes Are Not Used

Indexes cannot be used on expressions (e.g., WHERE id+1=5) or functions.

select * from t where id+1 = 5;

Prefix Indexes

Index only the leading characters of long columns to save space.

Multi‑Column Indexes and Order

Separate single‑column indexes rarely outperform a combined multi‑column index because the optimizer can merge indexes only in limited cases. Place the most selective column first.

Index selectivity = distinct values / total rows; a value close to 1 is ideal.

Avoid Multiple Range Conditions

MySQL can use an index on only one range condition per query; additional range predicates cause full scans.

Covering Indexes

Scanning only the index reduces I/O because index entries are smaller than full rows.

Ordered index scans produce sorted results without extra sorting steps.

Using Index Scan for Sorting

If the ORDER BY column order matches the index order, MySQL can return rows already sorted.

SELECT staff_id,customer_id FROM demo WHERE date='2015-06-01' ORDER BY staff_id,customer_id;

Redundant and Duplicate Indexes

Avoid creating identical indexes on the same column set; remove them promptly. Occasionally a duplicate may be justified for very large tables where a narrower index reduces I/O.

Dropping Unused Indexes

Periodically drop indexes that have not been used for a long time.

Specific Query Optimizations

Optimizing COUNT()

COUNT(*)

counts rows efficiently; counting a specific column skips NULLs. For approximate counts, use EXPLAIN row estimates or maintain a summary table.

Optimizing Joins

Ensure the columns used in ON or USING have indexes, preferably on the second table of the join order.

Make sure GROUP BY and ORDER BY reference only one table to allow index usage.

MySQL executes joins as nested loops; the outer iterator fetches rows from the first table, the inner iterator looks up matching rows in the second table using an index.

outer_iterator = SELECT A.xx,A.c FROM A WHERE A.xx IN (5,6);
while (outer_row) {
    inner_iterator = SELECT B.yy FROM B WHERE B.c = outer_row.c;
    while (inner_row) {
        output[inner_row.yy, outer_row.xx];
        inner_row = inner_iterator.next();
    }
    outer_row = outer_iterator.next();
}

Optimizing LIMIT Pagination

Large offsets cause MySQL to read and discard many rows. Use covering indexes or “bookmark” pagination (e.g., WHERE id > last_id LIMIT 10) to avoid the offset cost.

SELECT id FROM t WHERE id > 10000 LIMIT 10;

Optimizing UNION

UNION builds a temporary table; use UNION ALL when duplicate removal is unnecessary, and push down WHERE, LIMIT, and ORDER BY into each subquery.

Conclusion

Understanding the stages of MySQL query execution and where time is spent, combined with the optimization techniques described, helps you apply theory to practice and achieve measurable performance gains.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

index design Query Optimization Performance Tuning

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.