Understanding Database Transaction Isolation: From ACID Basics to TDSQL Implementation
This article explains the ACID properties of database transactions, clarifies the often‑confused concepts of consistency and isolation, details the four SQL standard isolation levels, compares lock‑based and MVCC concurrency controls, and examines concrete implementations in MySQL, PostgreSQL, and the distributed TDSQL system.
1.1 Transaction Basics and ACID Properties
A transaction is a user‑defined sequence of database operations that must either all succeed or all fail, forming an indivisible unit of work. It satisfies the ACID properties:
Atomicity (A) : all operations are performed or none are.
Consistency (C) : a committed transaction moves the database from one consistent state to another.
Isolation (I) : concurrently executing transactions do not interfere with each other.
Durability (D) : once committed, changes persist permanently.
Atomicity and durability are intuitive, while consistency and isolation are more complex and often cause confusion.
1.2 Understanding Consistency
Consistency is an application‑level guarantee. The database ensures that a transaction never violates user‑defined integrity constraints such as primary‑key, foreign‑key, unique, NOT‑NULL, and CHECK constraints. Whether the data is semantically consistent must be judged by the application logic.
1.3 Causes of Inconsistency
Conflicts occur when concurrent transactions access the same data and at least one of them writes. Unresolved conflicts produce anomalies such as dirty writes, dirty reads, non‑repeatable reads, and phantom reads.
Time dimension : separate conflicting operations in time, typically using locks.
Space dimension : separate conflicting operations in space, typically using multi‑version concurrency control (MVCC).
1.4 Understanding Isolation
Isolation prevents concurrent transactions from interfering with each other. The SQL standard defines four isolation levels—Read Uncommitted, Read Committed, Repeatable Read, and Serializable—each prohibiting a specific set of anomalies.
2.1 SQL Standard Isolation Levels
The four levels are defined by which anomalies they forbid:
Read Uncommitted: allows all three anomalies.
Read Committed: prevents dirty reads but allows non‑repeatable reads and phantom reads.
Repeatable Read: prevents dirty reads and non‑repeatable reads; phantom reads may still occur.
Serializable: forbids all three anomalies.
Because the definition relies on natural language, different systems may implement the levels differently; for example, snapshot isolation can avoid the three classic anomalies but may still exhibit write‑skew.
2.2 Lock‑Based Concurrency Control
Locks come in several types: read locks, write locks, and predicate locks. Read/write locks protect individual rows, while predicate locks protect a range of rows. The duration of a lock varies: some are released immediately after the operation, others persist until transaction commit.
Key differences among isolation levels lie in how read locks are handled:
Read Uncommitted : reads do not acquire locks; writes acquire write locks that are held until commit.
Read Committed : each read acquires a read lock; the lock is released after the read, allowing other transactions to modify the data before the next read.
Repeatable Read : read locks are held until the end of the transaction, preventing non‑repeatable reads but still allowing phantom reads because predicate locks are released earlier.
Serializable : both read and write locks are held until commit, providing the strongest isolation.
2.3 Multi‑Version Concurrency Control (MVCC)
Each database object can have multiple versions, each associated with a timestamp. A transaction reads the latest version whose timestamp is ≤ the transaction’s start‑timestamp (snapshot). Writes create a new version.
Two main ways to choose timestamps:
Use the transaction’s start time (e.g., PostgreSQL). The snapshot may exclude versions created after the start time.
Use the transaction’s end time (e.g., TDSQL). The commit timestamp becomes the version’s timestamp.
Typical isolation levels under MVCC are Read Committed, Snapshot Isolation (SI), and Serializable Snapshot Isolation (SSI).
2.3.1 Read Committed Implementation
Three theoretical implementations exist:
Read any committed version for each row (rarely used because it may read stale data).
Read the latest committed version for each row (can cause read‑skew).
At the start of each SQL statement, obtain a fresh snapshot and use it for the whole statement (the most common approach).
The third method avoids read‑skew by using the same snapshot throughout the statement.
2.3.2 Snapshot Isolation (SI) Implementation
At transaction start, a snapshot is taken and used for the entire transaction, guaranteeing repeatable reads. If a write conflict is detected during update, the transaction aborts.
SI eliminates dirty reads, non‑repeatable reads, and phantom reads, but it can still suffer from write‑skew.
2.3.3 Serializable Snapshot Isolation (SSI) Implementation
SSI builds on SI by recording the set of rows read by a transaction and running a detection algorithm to abort transactions that could lead to non‑serializable schedules. This may produce false positives but never misses a real conflict, thus eliminating write‑skew.
SSI is considered a true serializable isolation level.
2.3.4 Handling Write‑Write Conflicts
Two common policies are:
First commit wins : the transaction that commits first succeeds.
First write wins : the transaction that writes first blocks later writers.
2.4 MySQL Isolation Levels
MySQL distinguishes between current reads (locking reads) and snapshot reads (non‑locking). It implements the four standard levels as follows:
Read Uncommitted: no locks, possible dirty reads.
Read Committed: SELECT (non‑FOR UPDATE) uses snapshot reads; DML uses current reads with row‑level locks.
Repeatable Read: mixes current reads for DML and snapshot reads for plain SELECT, leading to some surprising behaviors.
Serializable: pure two‑phase locking; all reads and writes acquire locks, including gap locks.
2.5 PostgreSQL Isolation Levels
PostgreSQL is MVCC‑only; every transaction uses snapshot reads. It supports three levels:
Read Committed: each statement gets a fresh snapshot; update conflicts trigger EPQ (EvalPlanQual) re‑execution.
Snapshot Isolation (SI): a single snapshot for the whole transaction; conflicting updates cause abort.
Serializable Snapshot Isolation (SSI): adds predicate‑locking (SIREAD) to detect dangerous structures and abort if a non‑serializable schedule is possible.
3.1 TDSQL Architecture
TDSQL aims to provide a distributed database that behaves like a single‑node MySQL instance. Its key features include full MySQL compatibility, global consistency, transparent scaling, and online schema changes. The system consists of three layers:
SQL Engine (stateless compute layer)
TDStore (distributed KV storage layer)
TDMetaCluster (metadata and global timestamp service)
Data is sharded into Regions; each Region has three replicas synchronized via the Raft consensus algorithm.
3.2 Multi‑Replica Consistency
TDSQL uses Raft to ensure high availability and strong consistency among replicas. Raft handles leader election, log replication, safety, and configuration changes.
Leader election : guarantees a unique leader for the cluster.
Log replication : logs are replicated unidirectionally from leader to followers.
Safety : a leader must contain the most up‑to‑date log before being elected.
Configuration changes : performed in two phases to avoid split‑brain scenarios.
3.3 TDSQL Concurrency Control
TDSQL employs timestamp‑based MVCC. A global timestamp service in TDMetaCluster provides monotonically increasing timestamps.
When a transaction starts, it obtains a start‑ts (snapshot). Reads select the latest version whose timestamp ≤ start‑ts. Writes are buffered locally and only become visible at commit time, when a commit‑ts is assigned.
Example: for UPDATE A = A + 5, if the latest version of A has timestamp 3 with value 10, a transaction with start‑ts = 4 reads 10, computes 15, buffers the new version, and later commits with commit‑ts > 4.
If two concurrent transactions modify the same key, the one whose commit‑ts is larger wins; the other aborts after conflict detection.
This optimistic model checks at commit time whether any read key’s latest timestamp exceeds start‑ts. If so, the transaction aborts; otherwise it commits.
3.4 Distributed Commit Protocol
TDSQL uses a two‑phase commit (2PC) to guarantee atomicity across multiple nodes. The protocol assumes each node has durable write‑ahead logs, no permanent failures, and reliable communication. The coordinator drives the prepare phase; participants write a prepare record and acknowledge. After all participants are prepared, the coordinator issues the commit command.
If the coordinator crashes, Raft‑based storage elects a new leader, which can resume the transaction by replaying logs and completing the commit or abort.
3.5 Consistent Reads in TDSQL
During the prepare phase, TDSQL assigns a temporary prepare_ts. The final commit_ts is assigned at commit and is always ≥ prepare_ts. When a transaction reads a row that was prepared but not yet committed, it can determine visibility simply by comparing its start_ts with prepare_ts, avoiding unnecessary waiting.
References
[1] 王珊, 萨师煊. 《数据库系统概论》.
[2] 张树杰. 《PostgreSQL技术内幕:事务处理深度探索》.
[3] Berenson et al., "A critique of ANSI SQL isolation levels", ACM SIGMOD Record, 1995.
[4] Cahill & Röhm, "Serializable isolation for snapshot databases", TODS, 2009.
[5] Ports & Grittner, "Serializable snapshot isolation in PostgreSQL", arXiv, 2012.
[6] PostgreSQL source code related to EvalPlanQual.
[7] MySQL source code related to locking.
[8] MySQL documentation on isolation levels and locks.
[9] Gray & Lamport, "Consensus on transaction commit", ACM TODS, 2006.
[10] https://en.wikipedia.org/wiki/Two-phase_commit_protocol
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
