What Is Copy‑On‑Write (COW) and How It Powers MVCC and Java Concurrency
The article explains the Copy‑On‑Write (COW) technique, its read‑write separation design, step‑by‑step flow, practical demos, Java's CopyOnWriteArrayList implementation, and MySQL InnoDB MVCC usage, while also discussing its performance benefits and inherent limitations.
Copy‑On‑Write Overview
Copy‑On‑Write (COW) is a data‑update strategy for the "read‑many, write‑few" concurrency pattern. Reads access the original data without locks, while writes first duplicate the data, modify the copy, and then atomically switch the reference to the new version.
Core Process
Read operation: Directly accesses the original data page, lock‑free, providing high‑concurrency low‑latency reads.
Write operation: Creates a new copy of the original data, performs modifications on the copy, and finally performs an atomic pointer switch to the new copy; the old copy is reclaimed afterwards.
Step‑by‑Step Flow
Multiple threads initially share the same data copy.
When a write is needed, a new copy of the data page is created.
The modification is applied to the new copy.
Read threads continue to read the old copy without interference.
After modification, an atomic operation switches the reference to the new copy.
Simple Demo
Assume a table account(id, balance) with a hot row id=42. Under a traditional exclusive lock, all reads block while the write holds the lock, reducing QPS to single digits. With COW, reads never block, and the write incurs only a one‑time copy cost.
Read threads A and B are never blocked.
Write thread C pays only the copy cost, not the overall QPS.
Consistency: old reads see the old value, new reads see the new value.
COW in Java
Since JDK 1.5, the java.util.concurrent package provides CopyOnWriteArrayList and CopyOnWriteArraySet, which embody the COW idea.
Internally a volatile Object[] array holds the elements.
Read methods (e.g., get()) return the element directly from the array without any lock.
Write methods (e.g., add()) acquire a ReentrantLock, copy the current array, modify the copy, and then atomically replace the reference.
public class CopyOnWriteArrayList<E> implements List<E>, RandomAccess, Cloneable, java.io.Serializable {
final transient ReentrantLock lock = new ReentrantLock();
private transient volatile Object[] array;
final Object[] getArray() { return array; }
final void setArray(Object[] a) { array = a; }
// ... other methods omitted
}The iterator is a snapshot iterator: it captures the array at creation time, reads from that snapshot, and throws UnsupportedOperationException for any modification attempts.
static final class COWIterator<E> implements ListIterator<E> {
private final Object[] snapshot;
private int cursor;
COWIterator(Object[] elements, int initialCursor) {
cursor = initialCursor;
snapshot = elements;
}
public E next() {
if (!hasNext()) throw new NoSuchElementException();
return (E) snapshot[cursor++];
}
public void remove() { throw new UnsupportedOperationException(); }
public void set(E e) { throw new UnsupportedOperationException(); }
}COW in MySQL InnoDB MVCC
InnoDB implements MVCC using the COW principle. Each row has hidden columns: DB_TRX_ID (6 bytes) – the transaction ID that last inserted or updated the row. DB_ROLL_PTR (7 bytes) – a pointer to the previous version stored in the undo log. DB_ROW_ID (6 bytes) – the row ID (clustered index key if no primary key).
When a transaction updates a row, InnoDB does not overwrite the page directly. Instead:
Copy: The current version of the row is copied into the undo log, creating an old version.
Write: The actual data page is updated with the new values, and DB_TRX_ID is set to the current transaction ID.
Rollback pointer update: DB_ROLL_PTR is set to point to the undo‑log entry containing the old version.
This embodies COW: only the write operation triggers a copy, while reads can access the appropriate version without locking.
Read Process (Snapshot Read)
For a SELECT, InnoDB creates a Read View that records the set of active transactions. The engine then:
Scans the latest row version.
If the row's DB_TRX_ID belongs to a transaction that started after the current transaction or is still uncommitted, the engine follows DB_ROLL_PTR to the undo log to find a visible version.
Repeats until a visible version is found.
Snapshot reads are non‑locking and provide consistent “as‑of‑the‑transaction‑start” data.
Current Read (Locking Read)
Current reads acquire locks on the latest row version, ensuring that no other transaction can modify the row until the read transaction finishes.
Comparison: COW vs. MVCC
Core idea: Write‑time copy, read‑time shared original (COW) vs. Write‑time copy to undo log, read‑time version selection (MVCC).
Copy trigger: On write (COW) vs. On UPDATE/DELETE (MVCC).
Copy location: Memory or disk copy (COW) vs. Undo log (rollback segment) (MVCC).
Original data: Original page (COW) vs. Latest version in data page (MVCC).
Purpose: Performance and read consistency (COW) vs. High concurrency and consistent non‑locking reads (MVCC).
Limitations of COW
High write cost: Copying creates a temporary duplicate, potentially doubling memory usage during the operation.
Memory overhead: Large objects (e.g., a CopyOnWriteArrayList with 100 k elements) require copying the entire array on each write, leading to significant CPU and memory consumption.
Delayed consistency: Readers see the old version until the pointer switch completes, which is unsuitable for real‑time consistency requirements such as financial transactions.
Not for write‑heavy workloads: When writes dominate, the copy overhead becomes a bottleneck.
Tech Freedom Circle
Crazy Maker Circle (Tech Freedom Architecture Circle): a community of tech enthusiasts, experts, and high‑performance fans. Many top‑level masters, architects, and hobbyists have achieved tech freedom; another wave of go‑getters are hustling hard toward tech freedom.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
