How to Recover a PostgreSQL Database After Power Loss and Checkpoint Corruption
This guide explains how to diagnose and fix a PostgreSQL instance that fails to start after a power outage, covering log inspection, missing socket files, PID checks, checkpoint corruption detection, and recovery using pg_resetwal.
Problem
After a power outage, the PostgreSQL database fails to start, showing the error “psql: could not connect to server: No such file or directory”.
Background Analysis
The database runs as a single‑node deployment on Kubernetes. Two clusters shared overlapping pod CIDRs, causing occasional IP conflicts. After adjusting the pod network and restarting all pods, the PostgreSQL pod could not start. Container logs indicated the issue.
Solution Process
1. Check whether the Unix domain socket file exists:
kubectl exec -it -n namespace containerId /bin/shInside the container,
/var/run/postgresql/.s.PGSQL.5432was missing, confirming the PostgreSQL process was not running.
2. Verify the main PID file:
cat /var/lib/postgresql/11/main/master.pidThe PID file contains several fields (e.g., PID 154, data directory, start time, port 5432, socket directory). The PostgreSQL process was not present in the system.
3. Attempt to start PostgreSQL:
/usr/lib/postgresql/11/bin/pg_ctl -D /var/lib/postgresql/11/main startThe output included “invalid primary checkpoint record”, indicating checkpoint corruption.
4. Repair the checkpoint using the built‑in tool:
/usr/lib/postgresql/11/bin/pg_resetwal -D /var/lib/postgresql/11/mainAfter running
pg_resetwal, start the server again:
/usr/lib/postgresql/11/bin/pg_ctl -D /var/lib/postgresql/11/main startThe database starts successfully and can be accessed with client tools such as Navicat.
Summary
Root cause: Power loss or abnormal restart corrupted data files, causing an invalid checkpoint that prevented PostgreSQL from starting.
Resolution: Use
pg_resetwalto repair the checkpoint, then restart the PostgreSQL service.
Raymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.