Databases 5 min read

How to Recover a PostgreSQL Database After Power Loss and Checkpoint Corruption

This guide explains how to diagnose and fix a PostgreSQL instance that fails to start after a power outage, covering log inspection, missing socket files, PID checks, checkpoint corruption detection, and recovery using pg_resetwal.

Raymond Ops

Apr 8, 2025

How to Recover a PostgreSQL Database After Power Loss and Checkpoint Corruption

Problem

After a power outage, the PostgreSQL database fails to start, showing the error “psql: could not connect to server: No such file or directory”.

Background Analysis

The database runs as a single‑node deployment on Kubernetes. Two clusters shared overlapping pod CIDRs, causing occasional IP conflicts. After adjusting the pod network and restarting all pods, the PostgreSQL pod could not start. Container logs indicated the issue.

Solution Process

1. Check whether the Unix domain socket file exists: kubectl exec -it -n namespace containerId /bin/sh Inside the container, /var/run/postgresql/.s.PGSQL.5432 was missing, confirming the PostgreSQL process was not running.

2. Verify the main PID file: cat /var/lib/postgresql/11/main/master.pid The PID file contains several fields (e.g., PID 154, data directory, start time, port 5432, socket directory). The PostgreSQL process was not present in the system.

3. Attempt to start PostgreSQL:

/usr/lib/postgresql/11/bin/pg_ctl -D /var/lib/postgresql/11/main start

The output included “invalid primary checkpoint record”, indicating checkpoint corruption.

4. Repair the checkpoint using the built‑in tool:

/usr/lib/postgresql/11/bin/pg_resetwal -D /var/lib/postgresql/11/main

After running pg_resetwal, start the server again:

/usr/lib/postgresql/11/bin/pg_ctl -D /var/lib/postgresql/11/main start

The database starts successfully and can be accessed with client tools such as Navicat.

Summary

Root cause: Power loss or abnormal restart corrupted data files, causing an invalid checkpoint that prevented PostgreSQL from starting.

Resolution: Use pg_resetwal to repair the checkpoint, then restart the PostgreSQL service.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

PostgreSQL Database Recovery Checkpoint Corruption pg_resetwal

Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.