Recovering a Deleted Greenplum Standby Node: Analysis and Step‑by‑Step Procedure
This article describes the background, architecture, and a detailed troubleshooting process for restoring a mistakenly deleted Greenplum standby data directory, including status checks, configuration cleanup, standby initialization, verification commands, and additional recommendations for handling master failures in a distributed database cluster.
Background
The standby node of a Greenplum test environment was accidentally deleted, rendering it unusable; if the master also fails, the whole cluster would become unavailable, so rapid recovery of the standby is required.
Greenplum Cluster Architecture Overview
The cluster consists of a Master Server and multiple Segment Servers. The Master handles client connections, SQL parsing, workload distribution, and stores global metadata. Segments store portions of data and execute the bulk of query processing; each segment has a mirrored replica to ensure availability.
Local Simulation of the Customer Environment
A local Greenplum setup was created with one master, one standby master, and five segment nodes (each with four segments and mirrors, totaling 40 segments).
Fault Simulation
[root@standby01 ~]$ su - gpadmin
[gpadmin@standby01 ~]$ rm -rf /greenplum/gpdata/master/*
[gpadmin@standby01 ~]$ exit
[root@standby01 ~]$ reboot
[root@standby01 ~]$ su - gpadmin
[gpadmin@standby01 ~]$ ps -ef |grep postgres
[gpadmin@standby01 ~]$ cd /greenplum/gpdata/master/
[gpadmin@standby01 master]$ ll
total 0Fault Analysis and Resolution
4.1 Check Status
[gpadmin@master01 ~]$ gpstate -f
... Standby status = Standby process not running ... [gpadmin@master01 ~]$ psql -c "select * from gp_segment_configuration order by content asc,dbid;"
... Standby node is down ...4.2 Remove Faulty Standby Configuration
[gpadmin@master01 ~]$ gpinitstandby -r
Do you want to continue with deleting the standby master? Yy|Nn (default=N): y4.3 Add New Standby
[gpadmin@master01 ~]$ gpinitstandby -s standby01
Do you want to continue with standby master initialization? Yy|Nn (default=N): y4.4 Verify Cluster State
[gpadmin@master01 ~]$ gpstate -f
... Standby address = standby01 ... Standby status = Standby host passive ... [gpadmin@master01 ~]$ psql -c "select * from gp_segment_configuration order by content asc,dbid;"
dbid | content | role | preferred_role | mode | status | port | hostname | address | datadir
... (shows master and standby entries) ...4.5 Check Standby Processes
[gpadmin@standby01 ~]$ ps -ef |grep postgres
... postgres processes running under /greenplum-db-6.7.0/bin/postgres -D /greenplum/gpdata/master/gpseg-1 -p 5432 ... [gpadmin@standby01 master]$ pwd
/greenplum/gpdata/master
[gpadmin@standby01 master]$ ls
gpseg-1
[gpadmin@standby01 gpseg-1]$ ls
backup_label.old pg_hba.conf pg_tblspc base pg_ident.conf pg_twophase global pg_log pg_utilitymodedtmredo gpexpand.status_detail pg_logical PG_VERSION gpmetrics pg_multixact pg_xlog gpperfmon pg_notify postgresql.auto.conf gpsegconfig_dump pg_replslot postgresql.conf gpssh.conf interna1.auto.conf pg_snapshot spostmaster.pid pg_clog pg_stat recovery.conf pg_distributedlog pg_stat_tmp pg_dynshmem pg_subtransThe standby data was restored and the cluster returned to normal operation.
Additional Recommendations
If the master node fails, first promote the standby to become the new master to keep services running, then repair the original master and add it back as a standby. To revert to the original master‑standby configuration, offline the new master, promote the original master, and add a new standby.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.