Scaling and Optimizing a Greenplum Data Warehouse Cluster: Key Lessons
This article details the background, goals, design decisions, deployment steps, kernel tuning, fault‑recovery testing, performance optimization, and TPCH benchmark results of a Greenplum data‑warehouse cluster expansion, highlighting practical challenges and concrete solutions for large‑scale database environments.
1. Goals of the Cluster Architecture Redesign
The hardware refresh provided an opportunity to address several pain points:
Over‑provisioned GP segment count (10 Primary + 10 Mirror per node) caused instability when a node failed.
RAID‑5 storage risked high rebuild cost; switched to RAID‑10 for better reliability and performance.
Insufficient fault‑recovery testing for node loss and rolling replacements.
Outdated Greenplum version (four years old) limited features and performance.
Operating system upgrade needed from CentOS 6 to CentOS 7.
TPCH benchmark validation required after deployment.
2. Design Choices and Planning
Key decisions included:
Choosing the open‑source Greenplum 6.16.2 distribution for stability and bug fixes.
Using PostgreSQL‑based data‑mart for simple logic, but leveraging Greenplum for complex queries, resulting in separate GP clusters for warehouse and data‑mart.
Capacity planning: 12 servers, each with 12 segment nodes (6 Primary + 6 Mirror), plus dedicated Master and Standby nodes.
Four deployment schemes were evaluated:
Scheme 1: Mixed Master/Standby/Segment deployment.
Scheme 2: Separate Master/Standby and Segment nodes (fewer total nodes).
Scheme 3: Independent Segment deployment with Master/Standby on virtual machines.
Scheme 4: Minimal single‑node cluster (baseline for data‑mart).
3. Detailed Design and Implementation
Architecture diagram (Primary/Mirror per server, two‑node Master/Standby):
Kernel parameter tuning (based on official recommendations):
vm.swappiness=10
vm.zone_reclaim_mode = 0
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100
vm.dirty_background_ratio = 0 # See System Memory
vm.dirty_ratio = 0
vm.dirty_background_bytes = 1610612736
vm.dirty_bytes = 4294967296
vm.min_free_kbytes = 3943084
vm.overcommit_memory=2
kernel.sem = 500 2048000 200 40964. Cluster Deployment Steps
Populate /etc/hosts with all node IPs and hostnames.
Create gpadmin group and user:
groupadd gpadmin
useradd gpadmin -g gpadmin
passwd gpadminConfigure sysctl.conf and resource limits.
Install Greenplum via RPM:
# yum install -y apr apr-util bzip2 krb5-devel zip
# rpm -ivh open-source-greenplum-db-6.16.2-rhel7-x86_64.rpmPrepare host files, enable sudo for gpadmin, and distribute SSH keys with gpssh-exkeys -f hostlist.
Package the Greenplum binaries and copy to segment nodes:
gpscp -f /usr/local/greenplum-db/conf/hostlist /tmp/greenplum-db-6.16.2.tar.gz :/tmpCreate data directories on each node, e.g.:
mkdir -p /data1/gpdata/gpdatap1
mkdir -p /data1/gpdata/gpdatap2
mkdir -p /data2/gpdata/gpdatam1
mkdir -p /data2/gpdata/gpdatam2Run gpinitsystem with a custom config file:
gpinitsystem -c gpinitsystem_config -s standby_hostnameSample gpinitsystem_config excerpt:
MASTER_HOSTNAME=xxxx
declare -a DATA_DIRECTORY=(/data1/gpdata/gpdatap1 /data1/gpdata/gpdatap2 /data1/gpdata/gpdatap3 /data1/gpdata/gpdatap4 /data1/gpdata/gpdatap5 /data1/gpdata/gpdatap6)
TRUSTED_SHELL=ssh
declare -a MIRROR_DATA_DIRECTORY=(/data2/gpdata/gpdatam1 /data2/gpdata/gpdatam2 /data2/gpdata/gpdatam3 /data2/gpdata/gpdatam4 /data2/gpdata/gpdatam5 /data2/gpdata/gpdatam6)
MACHINE_LIST_FILE=/usr/local/greenplum-db/conf/seg_hosts5. Issue Checklist
Common pitfalls discovered during deployment:
Insufficient /etc/security/limits.conf open‑file limit (warning: 51200 < 65535).
Network interconnect errors caused by missing firewall rules; both INPUT and OUTPUT must allow traffic:
-A INPUT -p all -s xxxxx -j ACCEPT
-A OUTPUT -p all -s xxxxx -j ACCEPTIncorrect /etc/hosts entry for 127.0.0.1 on mixed Master/Standby/Segment setups caused query hangs; fixing the entry resolved the issue.
6. Fault Recovery Tests
Two scenarios were exercised:
Node crash recovery – restart the server, then:
select * from gp_segment_configuration where status!='u';
gprecoverseg -o ./recov;
gprecoverseg -r;
select * from gp_segment_configuration where status='u';Node permanently unavailable – rebuild data directories and run:
gprecoverseg -F;
select * from gp_segment_configuration where status='u' and role!=preferred_role;
gprecoverseg -r;Recovery of ~50 GB of data took roughly 3 minutes.
7. Optimization Review
Architecture optimization – after testing, the mixed deployment (Scheme 2) was chosen as the best balance of performance and resource usage.
SQL performance analysis – example of a slow plan vs. an efficient plan for SELECT count(*) FROM customer;:
# explain analyze select count(*) from customer;
--- slow plan (≈24 s) ---
Aggregate (cost=0.00..431.00 rows=1 width=8) (actual time=24792.916..24792.916 rows=1 loops=1)
-> Gather Motion 36:1 (slice1; segments: 36) (actual time=3.255..16489.394 rows=150000000 loops=1)
-> Seq Scan on customer (actual time=0.780..1267.878 rows=4172607 loops=1)
--- fast plan (≈1.5 s) ---
Aggregate (cost=0.00..842.08 rows=1 width=8) (actual time=1519.311..1519.311 rows=1 loops=1)
-> Gather Motion 36:1 (slice1; segments: 36) (actual time=634.787..1519.214 rows=36 loops=1)
-> Aggregate (actual time=1473.296..1473.296 rows=1 loops=1)
-> Seq Scan on customer (actual time=0.758..438.319 rows=4172607 loops=1)The root cause was stale statistics after loading 1 TB of data; running ANALYZE customer; refreshed the stats and restored performance.
Additional configuration tuning:
gpconfig -c statement_mem -m 2457600 -v 2457600
gpconfig -c gp_vmem_protect_limit -m 32000 -v 320008. Performance Results
TPCH benchmark timings for different cluster sizes:
10 physical nodes (6 Primary + 6 Mirror × 10 + 2):
count(*) from customer: 1.24 s
count(*) from lineitem: 10.66 s6 physical nodes (6 × 6):
customer: 1.35 s
lineitem: 18.15 s4 physical nodes (6 × 4):
customer: 1.53 s
lineitem: 25.07 sOverall TPCH query set (19 queries) showed that the mixed deployment achieved near‑expected performance on 1 TB data, while other schemes suffered from network and I/O bottlenecks.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
