Databases 35 min read

Deep Dive into MySQL: Execution Flow, Logs, Locks, MVCC, and Performance Tuning

This comprehensive guide explores MySQL’s internal architecture, covering the SQL execution process, server and storage engine layers, binlog, redo and undo logs, index structures, transaction isolation levels, lock mechanisms, MVCC implementation, buffer pool management, table optimization, query techniques, and advanced performance tuning strategies.

Java Backend Technology

Dec 15, 2020

Deep Dive into MySQL: Execution Flow, Logs, Locks, MVCC, and Performance Tuning

1. SQL Execution Flow

MySQL is divided into the Server layer and the Storage Engine layer. The Server layer includes the connector, query cache (removed after MySQL 8.0), parser, optimizer, and executor. The Storage Engine layer is plugin‑based, with InnoDB being the default engine.

Server Layer

Connector

: Performs TCP handshake, validates user credentials, and fixes permissions for the duration of the connection. Query Cache: Stores result sets; disabled in recent versions due to high invalidation cost. Parser: Checks SQL syntax against MySQL grammar. Optimizer: Generates multiple execution plans and selects the most efficient one. Executor: Executes the chosen plan and writes results to the storage engine.

Storage Engine Layer

Supports multiple engines (InnoDB, MyISAM, etc.). InnoDB stores data in a clustered B+‑tree and provides transaction support, while MyISAM uses separate data and index files.

2. BinLog, RedoLog, UndoLog

BinLog

records all data‑definition and data‑modification events for replication. It has three modes:

STATEMENT : Logs only the SQL statements that change data; low I/O but can produce nondeterministic results.

ROW : Logs before‑and‑after images of each row; safe but generates large files.

MIXED : Uses STATEMENT for most operations and ROW for those that require row‑level accuracy. RedoLog (write‑ahead log) records changes before they are flushed to data pages, ensuring crash‑safe recovery. It consists of a write position and a checkpoint that determines which part of the log can be reclaimed. UndoLog stores the previous version of rows for rollback and MVCC. It includes insert‑undo and update‑undo entries.

3. Indexes

MySQL supports several index types:

Hash : Fast equality lookups, unsuitable for range queries.

Ordered Array : Used by some engines; costly for inserts.

B+ Tree : Default for InnoDB; supports range scans and efficient lookups.

Primary Key : Clustered index storing the full row.

Unique Index : Guarantees uniqueness, allows NULL.

Secondary (Non‑clustered) Index : Stores the primary key as the leaf value.

Covering Index : Contains all columns needed by a query, eliminating the need for a table lookup.

Composite Index : Combines multiple columns; follows the left‑most prefix rule.

Key design principles: avoid excessive indexes, skip indexes on frequently updated columns, and prioritize columns with high selectivity.

4. Transaction Isolation Levels

MySQL implements the ACID properties. The four isolation levels are: READ UNCOMMITTED: Allows dirty reads. READ COMMITTED (Oracle default): Prevents dirty reads. REPEATABLE READ (MySQL default): Prevents dirty and non‑repeatable reads; uses gap locks to avoid phantom reads. SERIALIZABLE: Highest isolation, blocks concurrent writes.

Common anomalies include dirty reads, non‑repeatable reads, and phantom reads. Proper use of SELECT ... LOCK IN SHARE MODE and SELECT ... FOR UPDATE can enforce current‑read locking.

5. Locks

MySQL provides three lock granularities: table, page, and row. InnoDB supports many lock types, such as shared, exclusive, intention, record, gap, next‑key, insert‑intention, AUTO‑INC, and predicate locks.

Pessimistic locks (e.g., SELECT ... FOR UPDATE) guarantee exclusive access but can reduce concurrency. Optimistic locks rely on version numbers or timestamps and retry on conflict.

6. MVCC (Multi‑Version Concurrency Control)

MVCC enables non‑blocking reads by keeping multiple row versions. Each row stores hidden fields: DB_TRX_ID (creating transaction), DB_ROLL_PTR (pointer to previous version), DB_ROW_ID (hidden primary key), and a delete flag.

During a snapshot read, MySQL selects rows whose creating transaction ID is ≤ the transaction’s snapshot and whose delete ID is either NULL or > the snapshot. This provides a consistent view without acquiring locks.

7. Buffer Pool

The InnoDB buffer pool caches data pages (default 16 KB) to reduce disk I/O. It uses an LRU algorithm split into “old” and “new” generations to mitigate cache pollution and improve hot‑data retention.

8. Table Shrinking

DELETE marks rows as reusable but does not shrink the .ibd file, leaving “holes”. Rebuilding a table (or using ALTER TABLE … ENGINE=InnoDB or tools like gh‑ost) compacts the data and releases space.

9. Joins, Statistics, Random Queries

MySQL supports seven join types (inner, left/right outer, cross, etc.). For row counting, COUNT(*) is the most efficient. Random rows should be fetched using a pre‑computed count and LIMIT offset,1 rather than ORDER BY RAND() to avoid temporary tables and filesorts.

10. MySQL Optimization

Optimization focuses on four areas: SQL & indexes, schema design, system configuration, and hardware. Key steps include using the slow‑query log, analyzing EXPLAIN output (key, key_len, type, Extra), creating covering indexes, avoiding full‑table scans, and batching DML.

11. Schema Optimization

Prefer tiny integer types (TINYINT, SMALLINT) and UNSIGNED where possible.

Limit VARCHAR length to the actual maximum needed.

Use TIMESTAMP instead of DATETIME when appropriate.

Keep the number of columns per table modest (≈20).

Avoid NULL columns; use empty strings or sentinel values.

12. Read/Write Splitting & Sharding

Read/write splitting directs writes to the primary and reads to replicas, implemented via application code or middleware. Sharding can be vertical (separate modules into different databases) or horizontal (split large tables by a key, e.g., user_id % 1024). Tools such as Mycat, Sharding‑Sphere, and gh‑ost assist with these patterns.

13. TiDB

TiDB is an open‑source, MySQL‑compatible distributed SQL database that provides horizontal scalability, strong consistency, and automatic failover. It is suitable for workloads exceeding 50 million rows, high‑concurrency OLTP/OLAP, and multi‑region deployments, but is overkill for small, single‑node use cases.

End

Additional resources: SQL basics, interview questions, MySQL deep‑dive articles, and open‑source projects.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Optimization MySQL Indexes MVCC

Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.