Mastering Data Consistency: Transactions, Isolation Levels, and MVCC Explained
This article explores how to achieve data consistency by examining transaction properties, isolation levels, common concurrency issues like dirty, non-repeatable, and phantom reads, and compares lock‑based concurrency control with multi‑version concurrency control (MVCC), detailing undo logs, read‑view mechanisms, and practical examples.
Data Consistency and How to Achieve It
Data consistency is a key indicator of data accuracy. This article studies consistency from the perspective of transaction characteristics and isolation levels.
1. Consistency
Data consistency usually refers to whether the logical relationships among related data are correct and complete.
Example: A system uses read‑write separation. User Li updates his education from high school to bachelor, but because the backup database has not yet synchronized, HR still sees the old record, causing a consistency problem.
Database consistency means the database moves from one consistent state to another, which is the definition of transaction consistency.
Example: A warehouse has 100 items of product A, a store has 10. After transferring 50 items, the total should remain 110. If the store shows 60 while the warehouse still shows 100, the database is inconsistent.
2. Database Transactions
A transaction is a sequence of database operations that must be executed entirely or not at all. Its properties are:
Atomicity: all operations are indivisible.
Consistency: concurrent transactions must yield results equivalent to some serial execution order.
Isolation: intermediate results of a transaction are invisible to others.
Durability: once committed, changes survive failures.
3. Concurrency Problems
In concurrent environments, databases may encounter dirty reads, non‑repeatable reads, and phantom reads.
3.1 Dirty Read
Transaction A reads data written by Transaction B before B commits. If B rolls back, A has read “dirty” data.
3.2 Non‑repeatable Read
The same query within a transaction returns different results because another transaction modified the data in between.
3.3 Phantom Read
New rows appear (or disappear) in the result set of a repeated query because another transaction inserted or deleted rows.
4. Transaction Isolation Levels
Read Uncommitted : Allows dirty reads.
Read Committed : Prevents dirty reads but allows non‑repeatable reads.
Repeatable Read : Prevents dirty and non‑repeatable reads; phantom reads may still occur.
Serializable : Executes transactions sequentially, eliminating all three problems.
5. Solving Read Consistency
Two main approaches are Lock‑Based Concurrency Control (LBCC) and Multi‑Version Concurrency Control (MVCC).
5.1 LBCC
Locks the data being read or written, forcing other transactions to wait. This pessimistic method can degrade performance under high concurrency.
5.2 MVCC
Creates a snapshot of data for each transaction. Reads see a consistent view of the database as of the transaction’s start time, independent of concurrent writes.
In InnoDB, MVCC is implemented via undo logs and a read‑view.
5.2.1 Undo Log
The undo log records the original state of rows before they are modified, enabling rollback and providing previous versions for snapshot reads.
Insert: stores the primary key.
Delete: stores the full row data.
Update (no PK change): stores the previous values.
Update (PK change): stores the full old row and later inserts the backup.
5.2.2 Read‑View
Each transaction maintains a read‑view containing the smallest active transaction ID (min_trx_id), the next transaction ID to be assigned (max_trx_id), and the list of active transaction IDs (m_ids). A row version is visible if:
Its trx_id equals the creator_trx_id (the transaction itself).
Its trx_id is less than min_trx_id (committed before the view).
Its trx_id is greater than max_trx_id (created after the view, thus invisible).
If trx_id lies between min_trx_id and max_trx_id, it is visible only when its transaction ID is not in m_ids.
5.2.3 Data Retrieval Methods
Snapshot Read : Non‑locking read that returns the version visible to the transaction’s read‑view.
Current Read : Locking read that returns the latest committed version.
5.2.4 Example
Transaction A inserts a row (trx_id=1). Transaction B reads the data (snapshot sees trx_id=1). Transaction C updates the row (trx_id=3, roll_pointer points to B’s undo log). Transaction D deletes another row, etc. When B later performs a current read, it sees the latest committed state, which may differ from its earlier snapshot.
6. Conclusion
Both LBCC and MVCC can solve read‑consistency problems; the choice depends on business scenarios. MVCC often offers better concurrency with lower lock contention, while lock‑based methods may be simpler for certain workloads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
