Databases 14 min read

Recover Lost PostgreSQL Data: Tools, Techniques, and Best Practices

This article explains why PostgreSQL data loss occurs, compares DDL and DML causes, and reviews several recovery tools—including pg_resetwal, pg_dirtyread, pg_recovery, pg_filedump, WalMiner, and pageinspect—providing usage examples, summaries, and practical recommendations for effective data restoration.

Qingyun Technology Community
Qingyun Technology Community
Qingyun Technology Community
Recover Lost PostgreSQL Data: Tools, Techniques, and Best Practices

Data Loss Causes

Data loss in PostgreSQL is usually caused by DDL and DML operations.

DDL

In PostgreSQL, tables are stored as files under PGDATA/base/DatabaseId/relfilenode. DROP TABLE removes the file entirely, so recovery must rely on disk‑level methods, which have low success, especially for cloud databases.

DML

DML includes UPDATE and DELETE. Because of MVCC, rows are not physically removed; they can be recovered by adjusting vacuum_defer_cleanup_age to retain dead tuples.

Data Recovery Solutions

pg_resetwal

pg_resetwal (called pg_resetxlog before PostgreSQL 10) clears the write‑ahead log and can reset pg_control information, allowing access to dead tuples.

Usage Example

pg_resetwal works by setting a transaction ID, so you must first obtain the target transaction ID.

1. View current LSN

-- online query
select pg_current_wal_lsn();

-- offline query
./pg_controldata -D dj | grep 'checkpoint location'

2. Get transaction ID

./pg_waldump -b -s 0/2003B58 -p dj
... (output omitted) ...

3. Set transaction ID

-- stop database
./pg_resetwal -D dj -x 595
-- start database

4. Verify data

select * from xx;

Summary

pg_resetwal recovers data quickly, but requires high privileges on the server; may not work on managed cloud databases.

If DDL files are gone, metadata can be restored but the data files are lost, resulting in errors such as ERROR: could not open file "base/16392/16396".

After starting the database, avoid any operation that changes the transaction ID, otherwise the recovered data becomes invisible again.

Back up the entire PGDATA directory before using pg_resetwal and restore only the needed data.

pg_resetwal is complex and requires deep PostgreSQL knowledge.

pg_dirtyread

pg_dirtyread is a plugin (needs to be compiled) that reads dead tuples via MVCC, allowing recovery of UPDATE, DELETE, DROP COLUMN, ROLLBACK, etc.

Usage Example

CREATE TABLE foo(bar bigint, baz text);
INSERT INTO foo VALUES (1,'Test'), (2,'New Test');
DELETE FROM foo WHERE bar = 1;
SELECT * FROM pg_dirtyread('foo') as t(bar bigint, baz text);
-- result:
 bar |  baz
-----+---------
 1   | Test
 2   | New Test

Summary

pg_dirtyread is easy to use; installing the plugin enables data recovery.

It returns all rows, including those not deleted, e.g., bar=2 in the example.

Only DML data can be recovered via MVCC.

pg_recovery

pg_recovery is similar to pg_dirtyread but returns only the rows that need to be recovered by default. Future versions will add debugging information to help locate needed data.

Usage Example

CREATE TABLE foo(bar bigint, baz text);
INSERT INTO foo VALUES (1,'Test'), (2,'New Test');
DELETE FROM foo WHERE bar = 1;
SELECT * FROM pg_recovery('foo') as t(bar bigint, baz text);
-- result:
 bar |  baz
-----+---------
 1   | Test

Summary

pg_recovery focuses on data recovery and is simpler to use.

Setting recoveryrow => false returns all rows.

It can only recover DML data.

pg_filedump

pg_filedump is a command‑line tool that reads raw data files without connecting to the database, useful for severe disasters but requires knowledge of exact file locations.

Usage Example

./pg_filedump -D int,varchar dj/base/24679/24777
Item 1 -- Length: 30 Offset: 8160 (0x1fe0) Flags: NORMAL
COPY: 1 a
... (output truncated) ...

Summary

pg_filedump can read files directly, suitable for catastrophic cases, but needs file paths and is not ideal for cloud databases.

It can retrieve data with a single SQL command after custom scripting.

It cannot recover custom data types and is unsuitable for cloud environments.

WalMiner

WalMiner parses PostgreSQL write‑ahead logs (WAL) to extract useful information, including SQL statements and undo SQL, enabling data recovery even when logical decoding is unavailable.

Usage Example

postgres=# select record_database, record_user, op_text, op_undo from walminer_contents;
-[ RECORD 1 ]---+-------------------------------------------------
record_database | postgres
record_user     | lichuancheng
op_text          | INSERT INTO "public"."t2"("i","j","k") VALUES(1,1,'qqqqqq');
op_undo          | DELETE FROM "public"."t2" WHERE "i"=1 AND "j"=1 AND "k"='qqqqqq' AND ctid='(0,1)';

Summary

WalMiner recovers data from WAL as long as the logs are retained.

It can be combined with stored procedures for one‑click recovery.

pageinspect

pageinspect is a built‑in PostgreSQL extension (in contrib) that inspects raw page data and can read dead tuples, but it is low‑level and difficult to use for data recovery.

Data Structure

struct varlena {
    char vl_len_[4];    /* Do not touch this field directly! */
    char vl_dat[FLEXIBLE_ARRAY_MEMBER];  /* Data content is here */
};

Usage Example

test=# SELECT tuple_data_split('lzzhang'::regclass, t_data, t_infomask, t_infomask2, t_bits)
FROM heap_page_items(get_raw_page('lzzhang', 0));
-- output omitted

Summary

pageinspect is mainly for low‑level storage analysis; recovery is extremely hard and not recommended.

Data is not human‑readable; interpreting it requires deep knowledge of PostgreSQL internals.

Tips

Recovering invisible PostgreSQL data generally relies on reading dead tuples; the amount of retained dead tuples affects recoverability.

Because of MVCC, PostgreSQL runs autovacuum automatically, but it triggers only after about 10 % data bloat.

Setting vacuum_defer_cleanup_age preserves some dead tuples, reducing bloat impact. To force immediate cleanup, increase the transaction ID with select * from txid_current();, which clears dead tuples.

Even without the parameter, timely recovery operations can still retrieve data.

Conclusion

Different tools suit different scenarios. Ranking by ease of use (personal recommendation):

pg_recovery – simple, returns only needed rows.

pg_dirtyread – simple, returns all rows.

WalMiner – requires deep knowledge and preparation.

pg_resetwal – requires more understanding.

pg_filedump – needs custom scripts.

pageinspect – very difficult.

If you have no preparation, recommended steps:

Set vacuum_defer_cleanup_age promptly.

Install pg_recovery or pg_dirtyread.

If plugins cannot be installed, use pg_resetwal, which needs no extra tools.

Mastering data‑recovery tools is essential, but preventing data loss before an incident occurs is even more important.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PostgreSQLData RecoveryWALMVCCDatabase Toolspg_resetwalpg_dirtyread
Qingyun Technology Community
Written by

Qingyun Technology Community

Official account of the Qingyun Technology Community, focusing on tech innovation, supporting developers, and sharing knowledge. Born to Learn and Share!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.