Greenplum Segment Failure Diagnosis and Recovery Procedures
This article explains how to simulate and diagnose segment failures in a Greenplum cluster, including identifying master, segment, and tablespace issues, generating recovery configuration files, and using gprecoverseg and gpstate commands to restore segment roles and ensure all nodes are operational.
Greenplum clusters consist of master and segment servers, and failures can be categorized as master, segment, or data anomalies. This article focuses on diagnosing and resolving segment failures.
Local fault simulation
Two scenarios are demonstrated: (1) segment failure and (2) tablespace failure. The following commands are used to inspect the cluster state.
[gpadmin@master ~]$ gpstate
20221127:22:39:00:022659 gpstate:master:gpadmin-[INFO]:-Starting gpstate with args:
... (output truncated for brevity) ... [gpadmin@master ~]$ gpstate -m
20221127:22:44:55:023196 gpstate:master:gpadmin-[INFO]:-Starting gpstate with args: -m
... (output truncated for brevity) ...For the tablespace fault, the problematic tablespace directory is removed:
[gpadmin@data05 ~]$ cd /greenplum/gpdata/mirror/gpseg10
[gpadmin@data05 gpseg10]$ ls
... (directory listing) ...
[gpadmin@data05 gpseg10]$ rm -rf pg_tblspc/After reproducing the failures, the recovery process involves generating a configuration file with gprecoverseg -o and applying it with gprecoverseg -i ... -a . The cluster status is then verified using gpstate -e and psql queries.
[gpadmin@master ~]$ gprecoverseg -o ./recover1
20221127:22:48:41:023405 gprecoverseg:master:gpadmin-[INFO]:-Starting gprecoverseg with args: -o ./recover1
... (output truncated) ...
[gpadmin@master ~]$ more recover1
data05|55000|/greenplum/gpdata/primary/gpseg12
data05|55001|/greenplum/gpdata/primary/gpseg13
data05|55002|/greenplum/gpdata/primary/gpseg14
data05|55003|/greenplum/gpdata/primary/gpseg19 [gpadmin@master ~]$ gprecoverseg -i ./recover1 -a [gpadmin@master ~]$ gpstate -e
20221127:22:56:57:024771 gpstate:master:gpadmin-[INFO]:-All segments are running normallyThe segment mirroring status report shows all segments up, though some roles may be swapped. The role correction is performed with gprecoverseg -r , followed by another status check.
[gpadmin@master ~]$ gprecoverseg -r [gpadmin@master ~]$ gpstate -e
... (final status output confirming all segments up) ...For the tablespace issue, a manual recovery file can be created and applied similarly:
[gpadmin@master ~]$ vi recover2
data05|56001|/greenplum/gpdata/mirror/gpseg10 [gpadmin@master ~]$ gprecoverseg -i ./recover2 -a
... (recovery output) ...Final checks confirm that all segments are running normally and data is consistent across nodes.
[gpadmin@master ~]$ psql -c "select * from gp_segment_configuration order by content asc,dbid;"
... (configuration table output) ...Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.