Why PostgreSQL Vacuums Can’t Reclaim Space After Deletes and Updates
The article examines why PostgreSQL tables often retain large disk footprints after massive inserts, updates, and deletions, showing experiments on PG 13/PG 14, analyzing factors such as long‑running transactions, replication slots, uncommitted transactions, fillfactor settings, and the old_snapshot_threshold bug, and offering practical diagnostics and mitigation steps.
Background
Goal: verify the claim that a PostgreSQL table that once held around one hundred million rows can retain a large disk‑space footprint even after vacuuming.
Experiment
Using PostgreSQL 13, a table test(id int, name text) was created. Ten million rows containing large text values were inserted. After the load, 99 % of the rows were deleted, leaving roughly 1 000 rows, and a VACUUM was executed. The table size remained about 1 710 MB. Further inserts were performed without any UPDATE statements; the size continued to grow, showing that the space freed by the deletions was not reused and new pages were allocated.
Observations
Page‑level inspection with heap_page_items(get_raw_page('test', n)) showed that after deleting rows 1 and 2 and reinserting them, the new rows occupied the first two pages, confirming that the space freed by the deletions was not immediately reusable. Subsequent inserts forced the creation of additional pages even after running VACUUM and VACUUM FULL.
Root‑Cause Factors
Long‑running transactions keep old tuple versions visible, preventing vacuum from reclaiming space. Example query to locate such backends:
SELECT pid, datname, usename, state, backend_xmin, backend_xid
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL OR backend_xid IS NOT NULL
ORDER BY GREATEST(age(backend_xmin), age(backend_xid)) DESC;Replication slots (logical or physical) that are inactive or have errors retain a xmin horizon, blocking vacuum. Inspect slots with:
SELECT slot_name, slot_type, database, xmin
FROM pg_replication_slots
ORDER BY age(xmin) DESC;Prepared or uncommitted transactions hold tuple versions. Query prepared transactions with:
SELECT gid, prepared, owner, database, transaction AS xmin
FROM pg_prepared_xacts
ORDER BY age(transaction) DESC;Hot standby feedback enabled on a replica ( hot_standby_feedback = on) causes the primary to retain old row versions needed by the standby, preventing page truncation.
Additional Contributing Settings
Inserts that become larger after an UPDATE can cause page bloat if the table’s fillfactor is not tuned.
The old_snapshot_threshold parameter (PG 13) can make vacuum report dead tuples but be unable to reclaim them when set improperly.
Recommendations
Set an appropriate fillfactor (e.g., 80‑85 %) for tables that experience frequent updates and deletes.
Avoid long‑running transactions that hold old snapshots.
Monitor and drop stale replication slots.
Leave old_snapshot_threshold at its default value unless a specific need dictates otherwise.
Use heap_page_items (or pg_freespace) to inspect page‑level space usage and verify that space is actually reclaimed.
References
Bug discussion: https://github.com/EnterpriseDB/zheap/blob/master/src/backend/access/table/vacuumblk.c
DBA StackExchange thread on size discrepancies: https://dba.stackexchange.com/questions/320395/significant-size-differences-between-size-calculations-for-tables-in-postgresql
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
