Databases 32 min read

Understanding MySQL Transactions, MVCC, Storage Engines, Indexes, and Optimization Techniques

This article explains MySQL transaction concepts and ACID properties, isolation levels, MVCC mechanics, differences between InnoDB and MyISAM, query execution flow, redo and binlog, index structures, optimization strategies, lock mechanisms, deadlock handling, and best practices for schema changes and sharding.

Code Ape Tech Column

Sep 1, 2021

Understanding MySQL Transactions, MVCC, Storage Engines, Indexes, and Optimization Techniques

1. What is a MySQL transaction? What are its four properties? What problems can it cause?

MySQL defines four isolation levels: READ UNCOMMITTED , READ COMMITTED , REPEATABLE READ , and SERIALIZABLE .

The four ACID properties are Atomicity , Consistency , Isolation , and Durability ( ACID).

Atomicity : the transaction is all‑or‑nothing; changes are applied completely or not at all, implemented via Redo/Undo logs.

Consistency : the database state before and after the transaction remains consistent.

Isolation : concurrent transactions do not interfere with each other; closely related to the chosen isolation level.

Durability : once a transaction commits, its changes are permanently stored.

In practice, most companies keep the default isolation level REPEATABLE READ because it balances consistency and performance.

Example with a user table (fields id and age) shows how the values of reads a1 – a5 differ under each isolation level.

READ UNCOMMITTED : a1 and a2 read the initial value 24; after transaction 2 updates, a3–a5 read 25.

READ COMMITTED : a1 and a2 read 24; a3 still reads 24 because the update is not yet committed; a4 and a5 read 25 after commit.

REPEATABLE READ : a1–a4 all read 24; a5 reads 25 after the commit.

SERIALIZABLE : a1–a4 read 24 due to write‑lock blocking; a5 reads 25 after the lock is released.

2. Do you understand MVCC? How does it work?

MVCC

(Multi‑Version Concurrency Control) uses consistent read views to support READ COMMITTED and REPEATABLE READ . Each row can have multiple versions; a new version is created after each update. Every transaction receives a monotonically increasing transaction ID, which is stored in row_trx_id. Older versions are reconstructed from the undo log. InnoDB leverages MVCC to create snapshot reads without heavy I/O.

3. Differences between InnoDB and MyISAM

Both are MySQL storage engines, but InnoDB supports transactions and row‑level locks, while MyISAM only supports table‑level locks and no transactions. InnoDB’s primary‑key index stores the full row in leaf nodes; MyISAM’s leaf nodes store row pointers. InnoDB uses .ibd files (size limited only by the OS), whereas MyISAM uses three files: .frm (table definition), .MYD (data), and .MYI (index). For read‑heavy workloads MyISAM may be faster, but for write‑heavy or transactional workloads InnoDB is preferred. COUNT(*) is fast in MyISAM (uses stored row count) but requires a full scan in InnoDB.

4. What is the execution flow of a query statement?

Client sends the SQL to the server.

Server authenticates the user and checks privileges.

Server checks the query cache (removed in MySQL 8).

Parser performs lexical and syntactic analysis, then the optimizer generates an execution plan.

Executor runs the plan, invoking the storage engine, and returns the result.

The execution is divided into the Service layer (connection, parser, optimizer, executor) and the Engine layer (e.g., InnoDB, MyISAM).

5. redo log and binlog

redo log

(also called WAL) is a physical log written before the data page is flushed to disk. It reduces I/O during transaction execution and provides crash‑safety. The log is a fixed‑size circular buffer with a write position and a checkpoint that marks the oldest data that can be reclaimed.

binlog

is a logical (archive) log stored at the server layer. It records the original SQL statements (STATEMENT format) or the row data (ROW format). Both redo log and binlog can be used to recover data, but they serve different purposes.

6. How to add a column to a hot table in production?

Adding a column triggers a full‑table scan and an MDL write lock, which can cause downtime. Strategies include performing the ALTER during low‑traffic windows, using LOCK=NONE (if supported), or breaking the operation into smaller steps with retries.

7. How are MySQL indexes implemented? Why not ordered arrays, hash, or binary trees?

MySQL uses B+‑tree indexes. Each leaf page is 16 KB, allowing 1–3 levels to index billions of rows, minimizing disk I/O. Hash indexes are only suitable for equality searches and cannot handle range queries. Ordered arrays would require costly row shifts on inserts. Binary trees would cause many random I/O operations due to low fan‑out.

8. How to check if an index is used and when it becomes ineffective?

Use EXPLAIN; the key column shows the index name if it is used. Indexes become ineffective in cases such as:

OR conditions in the WHERE clause.

LIKE patterns that do not follow the left‑most prefix rule.

Type mismatches (e.g., comparing a string column to a numeric literal).

NULL checks on indexed columns.

Using != or <>.

Applying functions or expressions to the indexed column.

9. Types of indexes

By data structure: B+‑tree, hash, R‑Tree, FULLTEXT. By storage: clustered (leaf nodes store rows) and non‑clustered. By logical purpose: primary, unique, normal, composite, and spatial indexes.

10. How to perform SQL optimization

Key steps include adding appropriate indexes (primary, unique, full‑text, normal), using EXPLAIN to analyze execution plans, avoiding patterns that invalidate indexes, and choosing the right index type based on read/write workload. InnoDB’s change buffer speeds up writes for normal indexes, while unique indexes bypass the buffer and cause more random I/O.

11. Clustered vs non‑clustered indexes

In a clustered index (InnoDB primary key), leaf nodes contain the full row. In a non‑clustered index, leaf nodes contain the indexed columns plus a pointer to the primary key.

12. What is a “covering index” (回表) and how does it happen?

When a secondary index is used to locate rows, MySQL may need to fetch the full row from the primary‑key index – this extra step is called a “covering index” (or “back‑to‑table”).

13. How to solve the covering‑index problem

Create a composite index that includes all columns needed by the query so the engine can satisfy the query directly from the index.

create table user (
  id int primary key,
  name varchar(20),
  sex varchar(5),
  index(name, sex)
) engine = innodb;

14. What is the left‑most prefix principle?

For a composite index, the optimizer can use the leftmost N columns. For string indexes, it can use the leftmost M characters. Queries that do not start with the leftmost prefix (e.g., LIKE '%suffix') cause full scans.

15. What is index condition pushdown?

Starting with MySQL 5.6, the engine can apply additional filter conditions directly on the secondary index, reducing the number of rows that need to be fetched from the primary key.

16. Auto‑increment ID vs UUID as primary key

Auto‑increment IDs are sequential, require only 4–8 bytes, and avoid page splits, offering better insert performance. UUIDs are 16 bytes, non‑sequential, cause page fragmentation and larger secondary indexes, and are generally discouraged unless required by business logic.

17. How does MySQL control concurrent access?

MySQL uses locks. MyISAM provides table‑level read/write locks. InnoDB provides row‑level locks, gap locks, and next‑key locks, allowing higher concurrency for mixed read/write workloads.

18. MySQL deadlocks: how they happen and how to resolve them

Deadlocks occur only with InnoDB’s row‑level locking. Typical scenarios involve two transactions acquiring locks on rows in opposite order. MySQL detects deadlocks and rolls back one transaction. Tuning parameters such as innodb_lock_wait_timeout and innodb_deadlock_detect can help.

19. MySQL master‑slave replication

Replication enables read/write separation: writes go to the master, reads are served by slaves. The master writes binary logs; slaves replay them to stay in sync.

20. Sharding (database/table partitioning)

Vertical partitioning splits a table by columns (e.g., moving large content field to a separate table). Horizontal partitioning splits rows across multiple tables or databases based on a key (e.g., user ID modulo N). Tools such as MyCat and Sharding‑JDBC are commonly used.

References

《MySQL45讲》

《Database Principles》

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Indexes transactions MVCC

Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.