How Snapshot Isolation and MVCC Prevent Read Skew in Modern Databases
The article explains how snapshot isolation and multi-version concurrency control (MVCC) address read skew and nonrepeatable reads in databases, detailing transaction IDs, version visibility rules, implementation differences across PostgreSQL, MySQL, Oracle, and the impact on backups, analytics, and index handling.
Snapshot isolation (SI) and multi-version concurrency control (MVCC) are key techniques for preventing anomalies such as read skew and nonrepeatable reads in relational databases.
Read Skew Example
A classic scenario involves Alice, who has two accounts each holding $500. During a transfer of $100 from account 1 to account 2, if she checks balances mid‑transaction she may see account 2 still at $500 while account 1 has already been reduced to $400, making the total appear to be $900. This phenomenon is called nonrepeatable read or read skew.
Skew originally described workload imbalance, but in this context it refers to anomalous timing of reads.
Why Snapshot Isolation Is Needed
While RC (read‑committed) prevents dirty reads, it still allows read skew, which is unacceptable for backups, long‑running analytical queries, or integrity checks that require a consistent view of the data.
Snapshot isolation solves this by giving each transaction a consistent snapshot of the database as of the transaction start time. All reads see the same committed state, regardless of concurrent writes.
Implementing Snapshot Isolation
Typical implementation steps:
Assign a unique, monotonically increasing transaction ID ( txid) when a transaction begins.
Write operations store the writer’s transaction ID with the new version of each row.
Read operations do not acquire locks; they simply filter visible versions based on transaction IDs.
Performance hinges on the rule that reads never block writes and writes never block reads, allowing long‑running read‑only queries to operate on a stable snapshot without lock contention.
MVCC Versioning Details
Each row carries a created_by field (the transaction that inserted it) and a deleted_by field (the transaction that marked it deleted). An UPDATE is internally transformed into a DELETE of the old version and an INSERT of the new version.
In the example, transaction 13 deletes the $500 row for account 2 and inserts a new $400 row, leaving two versions of the same logical row.
Visibility Rules for Consistent Snapshots
When a transaction starts, it ignores any rows created by other in‑progress transactions.
All changes made by aborted transactions are invisible.
Rows written by transactions with a later txid than the current transaction are invisible, even if those transactions have already committed.
All other rows are visible to the transaction.
A row is visible if (1) its creating transaction has committed before the reading transaction began, and (2) it has not been marked deleted by a committing transaction that completed before the reading transaction started.
Indexes and Snapshot Isolation
Indexes can point to all versions of a row, and the query engine filters out invisible versions during lookup.
Some systems (e.g., PostgreSQL) store multiple versions of the same row on a single page to avoid index updates.
Other systems (CouchDB, Datomic, LMDB) use append‑only B‑trees where each write creates a new tree version, eliminating the need for per‑row visibility checks at read time.
Naming Confusion Across Vendors
Oracle calls SI “Serializable”.
PostgreSQL and MySQL label it “Repeatable Read”.
The SQL standard defines “Repeatable Read” but does not specify SI, leading to divergent implementations (e.g., IBM DB2’s “Repeatable Read” actually provides serializable guarantees).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JavaEdge
First‑line development experience at multiple leading tech firms; now a software architect at a Shanghai state‑owned enterprise and founder of Programming Yanxuan. Nearly 300k followers online; expertise in distributed system design, AIGC application development, and quantitative finance investing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
