How MVCC Works in PostgreSQL vs InnoDB: A Deep Dive into Concurrency
This article explains the fundamentals of Multi-Version Concurrency Control (MVCC), compares its implementation in PostgreSQL and InnoDB, and highlights key differences in version storage, transaction handling, vacuuming, and index updates.
What is MVCC?
Multi‑Version Concurrency Control (MVCC) is a concurrency model that stores several versions of a data object. Each transaction sees a snapshot of the data that is consistent with its isolation level, allowing reads and writes to proceed without blocking each other.
MVCC in PostgreSQL
PostgreSQL stores rows (tuples) in heap pages (default 8 KB). Two system columns are added to every tuple:
xmin – the transaction ID that created the tuple (INSERT or UPDATE).
xmax – the transaction ID that deleted or superseded the tuple; NULL for the current version.
When a row is inserted, a new tuple with xmin set to the inserting transaction ID and xmax = NULL is written to the heap. An UPDATE creates a new tuple with a fresh xmin; the old tuple’s xmax is set to the updating transaction ID, forming a version chain. DELETE only sets xmax, leaving the old tuple in place.
Visibility rules (READ COMMITTED example) :
Transaction 495 inserts a tuple → xmin=495, xmax=NULL.
Transaction 496 starts before 495 commits; it cannot see the uncommitted tuple because its snapshot does not include transaction 495.
After transaction 495 commits, both sessions see the tuple because the tuple’s xmin is now visible.
A SELECT scans the version chain until it finds a tuple whose xmin is visible to the current snapshot and whose xmax is either NULL or belongs to a transaction that has not yet committed.
Old versions remain in the heap until the VACUUM process removes tuples whose xmax belongs to a committed transaction and that are no longer needed by any active snapshot.
MVCC in InnoDB
InnoDB adds two hidden columns to each row:
DB_TRX_ID – the transaction ID that inserted or last updated the row.
DB_ROLL_PTR – a pointer to the undo‑log record that stores the previous version of the row.
Only the latest version is kept in the clustered index. Older versions are stored in the undo segment (rollback segment). The workflow is:
INSERT : an undo record is written; the new row is stored in the clustered index.
UPDATE : a new row version is inserted; the old version’s data is written to the undo log and linked via DB_ROLL_PTR.
DELETE : the row is marked as deleted in the clustered index and an undo record describing the prior state is created.
During a rollback, InnoDB reconstructs the previous row version from the undo log. The undo log is split into INSERT‑UNDO (released immediately after commit) and UPDATE‑UNDO (released when no active snapshot requires the old version). A background purge thread removes obsolete undo records; there is no explicit VACUUM command.
Key Differences Between PostgreSQL and InnoDB
Size of old versions : PostgreSQL keeps a full copy of each old tuple; InnoDB stores only the delta (changed columns) in the undo log, making old versions smaller.
INSERT handling : InnoDB writes an undo record for every INSERT; PostgreSQL creates new versions only on UPDATE.
Rollback behavior : PostgreSQL only needs transaction IDs to determine visibility; InnoDB must rebuild the row from undo data.
Space reclamation : PostgreSQL relies on the VACUUM process to delete dead tuples; InnoDB’s purge thread frees UPDATE‑UNDO records when they become invisible.
Impact of delayed vacuum : In PostgreSQL, infrequent vacuuming can cause table bloat under heavy UPDATE/DELETE workloads, sometimes requiring a costly VACUUM FULL.
Index updates : PostgreSQL may need to update indexes for each version (mitigated by HOT – Heap‑Only Tuple); InnoDB’s clustered index stores only the latest version, avoiding repeated index maintenance.
Conclusion
PostgreSQL’s MVCC provides robust concurrency but can suffer from table bloat when updates and deletes are frequent; careful VACUUM tuning is required. InnoDB’s undo‑based MVCC avoids some of these bloat issues but introduces additional complexity in undo management and rollback. Future PostgreSQL releases are experimenting with undo‑based MVCC (e.g., the ZHEAP project) to combine the advantages of both approaches.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
