Recovering a Ceph PG from Inactive and Incomplete States After a Power Outage
This article details a step‑by‑step recovery of a Ceph placement group that entered inactive and incomplete states after a data‑center power loss, explaining the meanings of these states, attempted repairs, use of ceph‑objectstore‑tool to manipulate PG replicas, and restoration of lost RBD image metadata.
Background
After a flash‑flood caused a complete power loss in the data‑center, all servers rebooted and the virtual‑machine file systems were corrupted. The Ceph cluster showed a placement group (PG) in inactive and incomplete states, and new image I/O was stuck.
Investigation Process
Explanation of Inactive and Incomplete States
inactive
Meaning: the PG is unavailable.
Effect: client read/write requests to this PG are blocked.
Possible causes:
Not enough OSDs are up to provide service.
The PG is not assigned to suitable OSDs.
OSDs have not completed the peering process.
In other words, inactive means the PG cannot provide normal I/O services.
incomplete
Meaning: during peering the PG lacks required data replicas, preventing consistency.
Effect: the PG's data is incomplete and cannot serve reads/writes.
Common reasons:
Some OSDs crashed or lost data, so the PG cannot gather enough object copies.
New OSDs joined or data migration lost necessary replicas.
Disk failure or accidental deletion caused actual data loss.
Usually incomplete is more severe than inactive ; inactive only means the PG is temporarily unavailable, while incomplete indicates missing data replicas that require manual intervention.
Attempt PG Repair
All OSDs were up, but conventional repair commands did not help. ceph pg repair 2.1c Checking object count on the PG showed zero objects.
ceph pg ls | grep 2.1c
ceph pg 2.1c list_unfoundRolling back the PG version and restarting OSD services also failed.
ceph pg 2.1c mark_unfound_lost revert
ceph pg repair 2.1cMarking OSDs out and back in did not change the state.
ceph osd out <id>
ceph osd in <id>Using ceph-objectstore-tool to Operate on PG Replicas
The cluster remained unhealthy, so the ceph-objectstore-tool was used to keep only one replica of the PG, delete the other two, and then back‑fill from the remaining replica, finally marking the PG as complete.
Preparation
# Show OSDs holding the PG
ceph pg map 2.1c
# Prevent rebalancing during replica manipulation
ceph osd set noout
# Temporarily lower min_size
ceph osd pool set libvirt-pool min_size 1Export PG Replicas
The exported replica files were about a few tens of kilobytes each.
systemctl stop ceph-osd@8
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-8 --type bluestore --pgid 2.1c --op export --file /opt/2.1c.obj_osd_8
systemctl stop ceph-osd@14
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-14 --type bluestore --pgid 2.1c --op export --file /opt/2.1c.obj_osd_14
systemctl stop ceph-osd@11
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-11 --type bluestore --pgid 2.1c --op export --file /opt/2.1c.obj_osd_11Delete Faulty Replicas on Two OSD Nodes
systemctl stop ceph-osd@8
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-8/ --type bluestore --pgid 2.1c --op remove --force
systemctl stop ceph-osd@11
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-11/ --type bluestore --pgid 2.1c --op remove --forceImport the Remaining Replica to the Other Two OSD Nodes
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-11/ --type bluestore --pgid 2.1c --op import --file /opt/2.1c.obj_osd.14
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-8/ --type bluestore --pgid 2.1c --op import --file /opt/2.1c.obj_osd.14
systemctl start ceph-osd@11
systemctl start ceph-osd@8Mark the Remaining Replica as Complete
Standard repair still left the PG in incomplete state. ceph pg repair 2.1c Stop the third OSD and mark its PG as complete, then restore pool settings.
systemctl stop ceph-osd@14
# Mark complete
ceph-objectstore-tool --type bluestore --data-path /var/lib/ceph/osd/ceph-14 --pgid 2.1c --op mark-complete
ceph osd pool set libvirt-pool min_size 2
ceph osd unset noout
systemctl start ceph-osd@14After this the cluster reported a healthy state.
However, RBD images appeared missing even though storage size was unchanged.
Recovering RBD Image List Data
The header objects still existed, but the directory objects were lost.
rados -p libvirt-pool ls | grep '^rbd_header\.' | head
rbd -p libvirt-pool --image-id <id> infoMetadata of the header object was still present, indicating only the directory object was missing.
rados -p libvirt-pool listomapkeys rbd_header.d8b1996ee6b524 | headUsing rados we could still query the header object.
rados -p libvirt-pool stat rbd_header.d8b1996ee6b524Recreating the Directory Object
Two scripts were used: one to map header IDs to image names, and another to write the missing OMAP entries back into rbd_directory.
#!/bin/bash
# find_rbd_name.sh <IMAGE_ID>
set -euo pipefail
POOL="libvirt-pool"
ID="$1"
found=0
for obj in $(rados -p "$POOL" ls | grep '^rbd_id\.'); do
got=$(rados -p "$POOL" get "$obj" - 2>/dev/null | tr -d '
\r')
if [ "$got" = "$ID" ]; then
echo "Found: $obj -> name = ${obj#rbd_id.}"
found=1
fi
done
if [ $found -eq 0 ]; then
echo "No image for ID=$ID"
exit 2
fiThen the OMAP keys were restored directly:
#!/usr/bin/env bash
# fix_rbd_mapping.sh <NAME> <ID>
set -euo pipefail
POOL="libvirt-pool"
NAME="$1"
ID="$2"
# Ensure directory object exists
rbd pool init "$POOL"
# Backup existing keys (ignore errors)
rados -p "$POOL" getomapval rbd_directory "name_$NAME" /tmp/old_name_val.bin 2>/dev/null || true
rados -p "$POOL" getomapval rbd_directory "id_$ID" /tmp/old_id_val.bin 2>/dev/null || true
# Write name_<NAME> -> <ID>
python3 - <<'PY' | rados -p "$POOL" setomapval rbd_directory "name_$NAME"
import sys,struct
img_id="$ID"
sys.stdout.buffer.write(struct.pack("<I", len(img_id)))
sys.stdout.buffer.write(img_id.encode())
PY
# Write id_<ID> -> <NAME>
python3 - <<'PY' | rados -p "$POOL" setomapval rbd_directory "id_$ID"
import sys,struct
name="$NAME"
sys.stdout.buffer.write(struct.pack("<I", len(name)))
sys.stdout.buffer.write(name.encode())
PY
# Verify
rados -p "$POOL" listomapvals rbd_directory | egrep 'name_$NAME|id_$ID' -n
rbd -p "$POOL" ls | grep -F -- "$NAME" || true
rbd -p "$POOL" info "$NAME"After applying the scripts, the RBD image list was restored and the images became visible again.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
