Databases 5 min read

How to Recover a PostgreSQL Database After Power Loss and Checkpoint Corruption

This guide explains how to diagnose and fix a PostgreSQL instance that fails to start after a power outage, covering log inspection, missing socket files, PID checks, checkpoint corruption detection, and recovery using pg_resetwal.

Raymond Ops
Raymond Ops
Raymond Ops
How to Recover a PostgreSQL Database After Power Loss and Checkpoint Corruption

Problem

After a power outage, the PostgreSQL database fails to start, showing the error “psql: could not connect to server: No such file or directory”.

Error screenshot
Error screenshot

Background Analysis

The database runs as a single‑node deployment on Kubernetes. Two clusters shared overlapping pod CIDRs, causing occasional IP conflicts. After adjusting the pod network and restarting all pods, the PostgreSQL pod could not start. Container logs indicated the issue.

Solution Process

1. Check whether the Unix domain socket file exists: kubectl exec -it -n namespace containerId /bin/sh Inside the container, /var/run/postgresql/.s.PGSQL.5432 was missing, confirming the PostgreSQL process was not running.

2. Verify the main PID file: cat /var/lib/postgresql/11/main/master.pid The PID file contains several fields (e.g., PID 154, data directory, start time, port 5432, socket directory). The PostgreSQL process was not present in the system.

PID file screenshot
PID file screenshot

3. Attempt to start PostgreSQL:

/usr/lib/postgresql/11/bin/pg_ctl -D /var/lib/postgresql/11/main start

The output included “invalid primary checkpoint record”, indicating checkpoint corruption.

4. Repair the checkpoint using the built‑in tool:

/usr/lib/postgresql/11/bin/pg_resetwal -D /var/lib/postgresql/11/main
pg_resetwal command screenshot
pg_resetwal command screenshot

After running pg_resetwal, start the server again:

/usr/lib/postgresql/11/bin/pg_ctl -D /var/lib/postgresql/11/main start
Successful start screenshot
Successful start screenshot

The database starts successfully and can be accessed with client tools such as Navicat.

Summary

Root cause: Power loss or abnormal restart corrupted data files, causing an invalid checkpoint that prevented PostgreSQL from starting.

Resolution: Use pg_resetwal to repair the checkpoint, then restart the PostgreSQL service.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PostgreSQLDatabase RecoveryCheckpoint Corruptionpg_resetwal
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.