Operations 15 min read

Recovering a Ceph PG from Inactive and Incomplete States After a Power Outage

This article details a step‑by‑step recovery of a Ceph placement group that entered inactive and incomplete states after a data‑center power loss, explaining the meanings of these states, attempted repairs, use of ceph‑objectstore‑tool to manipulate PG replicas, and restoration of lost RBD image metadata.

Ops Development Stories
Ops Development Stories
Ops Development Stories
Recovering a Ceph PG from Inactive and Incomplete States After a Power Outage

Background

After a flash‑flood caused a complete power loss in the data‑center, all servers rebooted and the virtual‑machine file systems were corrupted. The Ceph cluster showed a placement group (PG) in inactive and incomplete states, and new image I/O was stuck.

Investigation Process

Explanation of Inactive and Incomplete States

inactive

Meaning: the PG is unavailable.

Effect: client read/write requests to this PG are blocked.

Possible causes:

Not enough OSDs are up to provide service.

The PG is not assigned to suitable OSDs.

OSDs have not completed the peering process.

In other words, inactive means the PG cannot provide normal I/O services.

incomplete

Meaning: during peering the PG lacks required data replicas, preventing consistency.

Effect: the PG's data is incomplete and cannot serve reads/writes.

Common reasons:

Some OSDs crashed or lost data, so the PG cannot gather enough object copies.

New OSDs joined or data migration lost necessary replicas.

Disk failure or accidental deletion caused actual data loss.

Usually incomplete is more severe than inactive ; inactive only means the PG is temporarily unavailable, while incomplete indicates missing data replicas that require manual intervention.

Attempt PG Repair

All OSDs were up, but conventional repair commands did not help. ceph pg repair 2.1c Checking object count on the PG showed zero objects.

ceph pg ls | grep 2.1c
ceph pg 2.1c list_unfound

Rolling back the PG version and restarting OSD services also failed.

ceph pg 2.1c mark_unfound_lost revert
ceph pg repair 2.1c

Marking OSDs out and back in did not change the state.

ceph osd out <id>
ceph osd in <id>

Using ceph-objectstore-tool to Operate on PG Replicas

The cluster remained unhealthy, so the ceph-objectstore-tool was used to keep only one replica of the PG, delete the other two, and then back‑fill from the remaining replica, finally marking the PG as complete.

Preparation

# Show OSDs holding the PG
ceph pg map 2.1c
# Prevent rebalancing during replica manipulation
ceph osd set noout
# Temporarily lower min_size
ceph osd pool set libvirt-pool min_size 1

Export PG Replicas

The exported replica files were about a few tens of kilobytes each.

systemctl stop ceph-osd@8
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-8 --type bluestore --pgid 2.1c --op export --file /opt/2.1c.obj_osd_8
systemctl stop ceph-osd@14
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-14 --type bluestore --pgid 2.1c --op export --file /opt/2.1c.obj_osd_14
systemctl stop ceph-osd@11
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-11 --type bluestore --pgid 2.1c --op export --file /opt/2.1c.obj_osd_11

Delete Faulty Replicas on Two OSD Nodes

systemctl stop ceph-osd@8
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-8/ --type bluestore --pgid 2.1c --op remove --force
systemctl stop ceph-osd@11
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-11/ --type bluestore --pgid 2.1c --op remove --force

Import the Remaining Replica to the Other Two OSD Nodes

ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-11/ --type bluestore --pgid 2.1c --op import --file /opt/2.1c.obj_osd.14
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-8/ --type bluestore --pgid 2.1c --op import --file /opt/2.1c.obj_osd.14
systemctl start ceph-osd@11
systemctl start ceph-osd@8

Mark the Remaining Replica as Complete

Standard repair still left the PG in incomplete state. ceph pg repair 2.1c Stop the third OSD and mark its PG as complete, then restore pool settings.

systemctl stop ceph-osd@14
# Mark complete
ceph-objectstore-tool --type bluestore --data-path /var/lib/ceph/osd/ceph-14 --pgid 2.1c --op mark-complete
ceph osd pool set libvirt-pool min_size 2
ceph osd unset noout
systemctl start ceph-osd@14

After this the cluster reported a healthy state.

However, RBD images appeared missing even though storage size was unchanged.

Recovering RBD Image List Data

The header objects still existed, but the directory objects were lost.

rados -p libvirt-pool ls | grep '^rbd_header\.' | head
rbd -p libvirt-pool --image-id <id> info

Metadata of the header object was still present, indicating only the directory object was missing.

rados -p libvirt-pool listomapkeys rbd_header.d8b1996ee6b524 | head

Using rados we could still query the header object.

rados -p libvirt-pool stat rbd_header.d8b1996ee6b524

Recreating the Directory Object

Two scripts were used: one to map header IDs to image names, and another to write the missing OMAP entries back into rbd_directory.

#!/bin/bash
# find_rbd_name.sh <IMAGE_ID>
set -euo pipefail
POOL="libvirt-pool"
ID="$1"
found=0
for obj in $(rados -p "$POOL" ls | grep '^rbd_id\.'); do
  got=$(rados -p "$POOL" get "$obj" - 2>/dev/null | tr -d '
\r')
  if [ "$got" = "$ID" ]; then
    echo "Found: $obj -> name = ${obj#rbd_id.}"
    found=1
  fi
done
if [ $found -eq 0 ]; then
  echo "No image for ID=$ID"
  exit 2
fi

Then the OMAP keys were restored directly:

#!/usr/bin/env bash
# fix_rbd_mapping.sh <NAME> <ID>
set -euo pipefail
POOL="libvirt-pool"
NAME="$1"
ID="$2"
# Ensure directory object exists
rbd pool init "$POOL"
# Backup existing keys (ignore errors)
 rados -p "$POOL" getomapval rbd_directory "name_$NAME" /tmp/old_name_val.bin 2>/dev/null || true
 rados -p "$POOL" getomapval rbd_directory "id_$ID" /tmp/old_id_val.bin 2>/dev/null || true
# Write name_<NAME> -> <ID>
python3 - <<'PY' | rados -p "$POOL" setomapval rbd_directory "name_$NAME"
import sys,struct
img_id="$ID"
sys.stdout.buffer.write(struct.pack("<I", len(img_id)))
sys.stdout.buffer.write(img_id.encode())
PY
# Write id_<ID> -> <NAME>
python3 - <<'PY' | rados -p "$POOL" setomapval rbd_directory "id_$ID"
import sys,struct
name="$NAME"
sys.stdout.buffer.write(struct.pack("<I", len(name)))
sys.stdout.buffer.write(name.encode())
PY
# Verify
rados -p "$POOL" listomapvals rbd_directory | egrep 'name_$NAME|id_$ID' -n
rbd -p "$POOL" ls | grep -F -- "$NAME" || true
rbd -p "$POOL" info "$NAME"

After applying the scripts, the RBD image list was restored and the images became visible again.

CephRBDPG recoveryceph-objectstore-toolinactiveincompletestorage operations
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.