Databases 5 min read

How to Recover a PostgreSQL Database After Power Loss and Checkpoint Corruption

This guide explains how to diagnose and fix a PostgreSQL instance that fails to start after a power outage, covering log inspection, missing socket files, PID checks, checkpoint corruption detection, and recovery using pg_resetwal.

Raymond Ops
Raymond Ops
Raymond Ops
How to Recover a PostgreSQL Database After Power Loss and Checkpoint Corruption

Problem

After a power outage, the PostgreSQL database fails to start, showing the error “psql: could not connect to server: No such file or directory”.

Error screenshot
Error screenshot

Background Analysis

The database runs as a single‑node deployment on Kubernetes. Two clusters shared overlapping pod CIDRs, causing occasional IP conflicts. After adjusting the pod network and restarting all pods, the PostgreSQL pod could not start. Container logs indicated the issue.

Solution Process

1. Check whether the Unix domain socket file exists:

kubectl exec -it -n namespace containerId /bin/sh

Inside the container,

/var/run/postgresql/.s.PGSQL.5432

was missing, confirming the PostgreSQL process was not running.

2. Verify the main PID file:

cat /var/lib/postgresql/11/main/master.pid

The PID file contains several fields (e.g., PID 154, data directory, start time, port 5432, socket directory). The PostgreSQL process was not present in the system.

PID file screenshot
PID file screenshot

3. Attempt to start PostgreSQL:

/usr/lib/postgresql/11/bin/pg_ctl -D /var/lib/postgresql/11/main start

The output included “invalid primary checkpoint record”, indicating checkpoint corruption.

4. Repair the checkpoint using the built‑in tool:

/usr/lib/postgresql/11/bin/pg_resetwal -D /var/lib/postgresql/11/main
pg_resetwal command screenshot
pg_resetwal command screenshot

After running

pg_resetwal

, start the server again:

/usr/lib/postgresql/11/bin/pg_ctl -D /var/lib/postgresql/11/main start
Successful start screenshot
Successful start screenshot

The database starts successfully and can be accessed with client tools such as Navicat.

Summary

Root cause: Power loss or abnormal restart corrupted data files, causing an invalid checkpoint that prevented PostgreSQL from starting.

Resolution: Use

pg_resetwal

to repair the checkpoint, then restart the PostgreSQL service.

KubernetesPostgreSQLDatabase RecoveryCheckpoint Corruptionpg_resetwal
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.