Databases 34 min read

MySQL Logical Architecture, Query Process, and Performance Optimization

This article explains MySQL's three‑layer logical architecture, the end‑to‑end query execution flow—including client/server protocol, query cache, parsing, cost‑based optimization, execution engine, and result delivery—followed by practical performance‑tuning advice on schema design, data types, index creation, and specific query optimizations such as COUNT(), JOINs, LIMIT pagination, and UNION handling.

Big Data Technology & Architecture

Dec 14, 2019

MySQL Logical Architecture

MySQL is organized into three logical layers. The top client layer handles connections, authentication, and security. The middle server layer performs query parsing, analysis, optimization, caching, built‑in functions, and provides a unified API for all storage engines. The bottom storage‑engine layer manages actual data storage and retrieval, similar to a file system, with APIs that hide engine differences.

MySQL Query Process

When a client sends a request, MySQL follows six steps: (1) client sends query; (2) server checks the query cache and returns cached results if available; (3) server parses, preprocesses, and the optimizer generates an execution plan; (4) the execution engine invokes storage‑engine APIs to fetch data; (5) results are returned to the client, optionally stored in the cache; (6) the client receives the incremental result set.

Client/Server Communication Protocol

The protocol is half‑duplex: at any moment only one side transmits data. Large queries require the max_allowed_packet setting; overly large packets cause the server to reject the request.

Query Cache

Before parsing, MySQL checks if the query cache is enabled and whether the statement hits the cache. A cache hit bypasses parsing and execution, returning the stored result directly. Cache entries are invalidated when any involved table changes, which adds overhead on write operations and can affect performance.

Syntax Parsing and Preprocessing

SQL is parsed into a syntax tree, validated against grammar rules, and preprocessed to ensure referenced tables and columns exist.

Query Optimization

The optimizer uses a cost‑based approach, estimating the cost of possible execution plans and choosing the cheapest. The current query cost can be inspected via the session variable last_query_cost.

Execution Engine

The chosen plan is executed using the storage‑engine handler API. Each table is represented by a handler instance that provides metadata and data access. The engine performs the operations defined by the plan.

Result Return

Results are streamed back to the client in packets; even an empty result set includes metadata such as affected rows and execution time.

Performance Optimization Suggestions

Schema Design and Data‑Type Optimization

Prefer small, simple data types. Use NOT NULL only when indexing. Width specifications like INT(11) have no effect. UNSIGNED doubles the positive range. For most cases avoid DECIMAL in favor of BIGINT with scaling. TIMESTAMP uses 4 bytes (1970‑2038) while DATETIME uses 8 bytes and has a larger range. Enumerations are rarely needed, and excessive columns increase CPU overhead.

Creating High‑Performance Indexes

Indexes (primarily B‑Tree/B+Tree) dramatically speed lookups but consume disk and memory. Over‑indexing harms write performance. Understanding the underlying data structures helps design efficient indexes.

Index Data Structures and Algorithms

MySQL typically uses B+Tree indexes. Leaf pages store actual row pointers; internal pages store only keys. B+Tree reduces tree height, minimizing I/O because each node fits a disk page.

High‑Performance Strategies

Use multi‑column (composite) indexes following the “most selective first” rule. Avoid redundant indexes; delete unused ones. Leverage covering indexes so queries can be satisfied from the index alone. Align ORDER BY with index order to avoid extra sorting. Use SQL_CACHE / SQL_NO_CACHE to control caching for specific queries.

Specific Query Optimizations

Optimizing COUNT()

COUNT(*)

counts rows efficiently; counting a column excludes NULL values. For approximate counts, use EXPLAIN row estimates or maintain summary tables.

Optimizing JOINs

MySQL executes joins as nested loops. Index the join column on the second (inner) table; the outer table may not need an index if its rows are filtered first. Ensure ON / USING columns are indexed and that GROUP BY / ORDER BY involve only indexed columns.

Optimizing LIMIT Pagination

Large offsets cause MySQL to scan and discard many rows. Prefer covering index scans or “keyset pagination” (e.g., WHERE id > last_id ORDER BY id LIMIT n). Delayed joins can also reduce scanned rows.

Optimizing UNION

Prefer UNION ALL to avoid costly duplicate elimination. Push predicates, LIMIT, and ORDER BY into each sub‑query to let the optimizer work on smaller result sets.

Conclusion

Understanding MySQL’s execution flow and the cost of each step, combined with solid schema design, appropriate data types, and well‑crafted indexes, enables developers to write queries that are both correct and performant.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL Indexing Query Optimization mysql Database Performance

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.