Understanding MySQL Query Optimization and Index Design
This article explains MySQL’s logical architecture, query processing steps, and the principles behind query optimization, covering topics such as client‑server protocol, query cache, parsing, cost‑based optimizer, execution engine, and practical index design strategies to improve performance.
MySQL Logical Architecture
MySQL consists of three layers: the client layer (handling connection, authentication, and security), the service layer (parsing, analysis, optimization, caching, built‑in functions, and cross‑engine features like stored procedures, triggers, and views), and the storage engine layer (responsible for actual data storage and retrieval). Communication between the service layer and storage engines is done via a unified API that abstracts engine differences.
MySQL Query Process
When a client sends a query, MySQL follows five main steps: (1) check the query cache, (2) parse and preprocess the SQL, (3) generate an execution plan with the cost‑based optimizer, (4) execute the plan by invoking storage‑engine handlers, and (5) return results to the client while optionally storing them in the cache.
Client/Server Communication Protocol
The protocol is half‑duplex; at any moment only one side sends data. Large queries require the max_allowed_packet setting, and overly large queries may be rejected by the server.
Query Cache
Before parsing, MySQL checks if the exact query string (including whitespace and comments) hits the cache. Cached results are returned without parsing or execution. Queries involving user‑defined functions, temporary tables, or system tables are never cached. Cache invalidation occurs when any involved table changes.
Parsing and Preprocessing
The parser builds a syntax tree and validates keywords and structure. Preprocessing checks the existence of referenced tables and columns.
Query Optimization
The optimizer transforms the syntax tree into one or more execution plans and selects the plan with the lowest estimated cost, which can be inspected via the last_query_cost session variable.
<code style='padding: 16px; background-color: rgb(39, 40, 34); color: rgb(221, 221, 221); font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace; font-size: 12px'>mysql> select * from t_message limit 10;
... (result set omitted) ...
mysql> show status like 'last_query_cost';
+-----------------+-------------+
| Variable_name | Value |
+-----------------+-------------+
| Last_query_cost | 6391.799000 |
+-----------------+-------------+</code>Execution Engine
The engine follows the chosen plan, invoking handler APIs of the storage engine for each table. Handlers provide a small set of operations that abstract the underlying engine implementation.
Returning Results
Results are streamed to the client row by row. Each row is sent as a complete packet, and the client must read the entire packet before the server can send the next one.
Performance Optimization Recommendations
Understanding the underlying mechanisms is essential before applying any optimization. The following sections present practical advice.
Schema and Data‑Type Design
Prefer NOT NULL for indexed columns; it does not improve speed by itself but is required for many index types.
Column width specifications like INT(11) have no effect on storage; the actual size is fixed.
Use appropriate unsigned types, timestamps, and avoid excessive ENUM usage.
Keep the number of columns reasonable to reduce row‑buffer copying overhead.
Avoid costly ALTER TABLE on large tables; consider online schema change tools.
High‑Performance Index Creation
Indexes accelerate lookups but add write overhead and consume space. Follow the “leftmost prefix” rule and prioritize columns with high selectivity.
Index Data Structures and Algorithms
MySQL primarily uses B‑Tree (or B+Tree in InnoDB) indexes. A B+Tree stores keys in leaf pages linked together, allowing efficient range scans. Node size matches the storage‑engine page size to minimize I/O (typically one I/O per level, O(log N) depth).
Specific Query Optimizations
COUNT() : Use COUNT(*) for row counts; it is clearer and often faster than counting a specific column.
JOINs : Ensure the join column of the second table has an index; the first table does not need one if it drives the iteration.
LIMIT with large offsets : Prefer covering‑index scans or “bookmark” pagination (e.g., WHERE id > last_id LIMIT 10) to avoid scanning millions of rows.
UNION : Use UNION ALL when duplicate elimination is unnecessary; push down WHERE, LIMIT, and ORDER BY into each subquery.
Covering Indexes and Order‑by Optimization
If an index contains all columns needed by the query, MySQL can satisfy the query from the index alone, eliminating the need for a table lookup. For ORDER BY to use an index, the index column order must match the ORDER BY clause exactly.
Conclusion
By understanding MySQL’s internal workflow—from client protocol through cache, parsing, optimization, execution, and result delivery—developers can make informed decisions about schema design, index selection, and query formulation, leading to measurable performance gains.
References
[1] Jiang Chengyao, *MySQL技术内幕‑InnoDB存储引擎*, Mechanical Industry Press, 2013. [2] Baron Schwartz et al., *High Performance MySQL* (3rd ed.), O'Reilly, 2013. [3] “B‑Tree and B+Tree in MySQL Indexes”.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
