Unveiling MySQL’s Query Execution: From Architecture to Optimization Strategies
This article explains MySQL’s logical architecture, the client‑server communication protocol, how queries are parsed, optimized and executed, the role of the query cache, and provides concrete performance‑tuning advice on schema design, indexing, B+Tree mechanics, and common pitfalls.
MySQL Logical Architecture
MySQL is organized into three layers: the client layer (handling connections, authentication, and security), the service layer (parsing, analysis, optimization, caching, built‑in functions, and cross‑engine features such as stored procedures, triggers, and views), and the storage‑engine layer (actual data storage and retrieval). Each layer communicates via well‑defined APIs, allowing different storage engines to plug in underneath the same server core.
Query Process Overview
When a client sends a query, MySQL follows six steps: (1) the client sends the request, (2) the server checks the query cache, (3) the SQL is parsed, preprocessed, and optimized into an execution plan, (4) the execution engine runs the plan via the storage‑engine API, (5) results are returned to the client, and (5) the result may be stored in the cache.
Client → Server request
Cache hit? Return cached rows
Parse → Preprocess → Optimizer → Execution plan
Handler API executes plan
Return rows (and optionally cache them)
Client/Server Communication Protocol
The MySQL protocol is half‑duplex: at any moment only one side transmits. A query packet is sent as a single unit, so very long queries require increasing max_allowed_packet. The server’s response may consist of many packets, and the client must read the entire result set before the server can stop sending.
Query Cache
If the cache is enabled, MySQL computes a hash of the query text, database, and protocol version. A cache hit returns the stored result without parsing or planning. Queries that use user‑defined functions, temporary tables, or system tables are never cached. Any write to a table invalidates all cache entries that reference that table, which adds overhead for both reads and writes.
Parsing and Preprocessing
The parser builds a syntax tree from the SQL keywords, then the preprocessor validates object existence (tables, columns) and other semantic rules.
Optimization
MySQL uses a cost‑based optimizer. The cost of a plan can be inspected with SHOW STATUS LIKE 'last_query_cost'. Example:
mysql> SELECT * FROM t_message LIMIT 10;
mysql> SHOW STATUS LIKE 'last_query_cost';
+-----------------+-------------+
| Variable_name | Value |
+-----------------+-------------+
| Last_query_cost| 6391.799000 |
+-----------------+-------------+Common optimization strategies include reordering joins, using MIN()/MAX() efficiently, early termination with LIMIT, and improving sort operations.
Execution Engine
After optimization, MySQL creates a handler instance for each table. The handler API abstracts storage‑engine operations, allowing the engine to fetch rows, index entries, and perform updates.
Result Delivery
Results are streamed back to the client packet by packet. If a query is cacheable, the result is also stored in the query cache for future identical queries.
Performance Optimization Advice
1. Schema and Data‑Type Design
Prefer small, simple data types (e.g., INT instead of VARCHAR for IP addresses).
Setting columns to NOT NULL only helps when you plan to index them.
Width specifications like INT(11) have no effect on storage. UNSIGNED doubles the positive range of integer types.
Use TIMESTAMP (4 bytes) instead of DATETIME (8 bytes) when the range is sufficient.
Avoid unnecessary ENUM columns; altering them requires a full table rebuild.
Keep the number of columns reasonable to reduce row‑buffer decoding overhead.
Large ALTER TABLE operations rewrite the whole table; consider tools like pt‑online‑schema‑change.
2. High‑Performance Indexing
Use B‑Tree (actually B+Tree) indexes; they store keys in leaf pages linked together for range scans.
Node size equals a page size to minimize I/O (one I/O per node).
Choose the most selective column as the first index column.
Prefix indexes save space for long string columns.
Multi‑column indexes must follow the “left‑most prefix” rule; order matters.
A covering index (all needed columns in the index) avoids back‑table lookups.
Index scans can produce ordered results, eliminating a separate sort step.
Avoid redundant or duplicate indexes; drop them unless a specific workload justifies.
Periodically drop indexes that are never used.
3. Specific Query Optimizations
COUNT() : Use COUNT(*) for row counts; it is faster and clearer than counting a non‑NULL column.
JOINs: MySQL uses nested‑loop joins; create an index on the join column of the second table in the join order.
LIMIT with large offsets: Prefer “keyset pagination” (e.g., WHERE id > last_id LIMIT 10) over OFFSET to avoid scanning many rows.
UNION vs UNION ALL: Use UNION ALL when duplicate elimination is not required; it avoids the costly temporary table with DISTINCT.
Conclusion
Understanding MySQL’s internal query flow, the cost model, and the data‑structure choices behind indexes empowers developers to apply optimization techniques wisely, test their impact, and avoid blind reliance on “rules of thumb.”
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
