Databases 20 min read

Scaling and Optimizing a Greenplum Data Warehouse Cluster: Key Lessons

This article details the background, goals, design decisions, deployment steps, kernel tuning, fault‑recovery testing, performance optimization, and TPCH benchmark results of a Greenplum data‑warehouse cluster expansion, highlighting practical challenges and concrete solutions for large‑scale database environments.

dbaplus Community
dbaplus Community
dbaplus Community
Scaling and Optimizing a Greenplum Data Warehouse Cluster: Key Lessons

1. Goals of the Cluster Architecture Redesign

The hardware refresh provided an opportunity to address several pain points:

Over‑provisioned GP segment count (10 Primary + 10 Mirror per node) caused instability when a node failed.

RAID‑5 storage risked high rebuild cost; switched to RAID‑10 for better reliability and performance.

Insufficient fault‑recovery testing for node loss and rolling replacements.

Outdated Greenplum version (four years old) limited features and performance.

Operating system upgrade needed from CentOS 6 to CentOS 7.

TPCH benchmark validation required after deployment.

2. Design Choices and Planning

Key decisions included:

Choosing the open‑source Greenplum 6.16.2 distribution for stability and bug fixes.

Using PostgreSQL‑based data‑mart for simple logic, but leveraging Greenplum for complex queries, resulting in separate GP clusters for warehouse and data‑mart.

Capacity planning: 12 servers, each with 12 segment nodes (6 Primary + 6 Mirror), plus dedicated Master and Standby nodes.

Four deployment schemes were evaluated:

Scheme 1: Mixed Master/Standby/Segment deployment.

Scheme 2: Separate Master/Standby and Segment nodes (fewer total nodes).

Scheme 3: Independent Segment deployment with Master/Standby on virtual machines.

Scheme 4: Minimal single‑node cluster (baseline for data‑mart).

Cluster capacity planning diagram
Cluster capacity planning diagram
Deployment scheme comparison
Deployment scheme comparison

3. Detailed Design and Implementation

Architecture diagram (Primary/Mirror per server, two‑node Master/Standby):

Detailed deployment diagram
Detailed deployment diagram

Kernel parameter tuning (based on official recommendations):

vm.swappiness=10
vm.zone_reclaim_mode = 0
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100
vm.dirty_background_ratio = 0 # See System Memory
vm.dirty_ratio = 0
vm.dirty_background_bytes = 1610612736
vm.dirty_bytes = 4294967296
vm.min_free_kbytes = 3943084
vm.overcommit_memory=2
kernel.sem = 500 2048000 200 4096

4. Cluster Deployment Steps

Populate /etc/hosts with all node IPs and hostnames.

Create gpadmin group and user:

groupadd gpadmin
useradd gpadmin -g gpadmin
passwd gpadmin

Configure sysctl.conf and resource limits.

Install Greenplum via RPM:

# yum install -y apr apr-util bzip2 krb5-devel zip
# rpm -ivh open-source-greenplum-db-6.16.2-rhel7-x86_64.rpm

Prepare host files, enable sudo for gpadmin, and distribute SSH keys with gpssh-exkeys -f hostlist.

Package the Greenplum binaries and copy to segment nodes:

gpscp -f /usr/local/greenplum-db/conf/hostlist /tmp/greenplum-db-6.16.2.tar.gz :/tmp

Create data directories on each node, e.g.:

mkdir -p /data1/gpdata/gpdatap1
mkdir -p /data1/gpdata/gpdatap2
mkdir -p /data2/gpdata/gpdatam1
mkdir -p /data2/gpdata/gpdatam2

Run gpinitsystem with a custom config file:

gpinitsystem -c gpinitsystem_config -s standby_hostname

Sample gpinitsystem_config excerpt:

MASTER_HOSTNAME=xxxx
declare -a DATA_DIRECTORY=(/data1/gpdata/gpdatap1 /data1/gpdata/gpdatap2 /data1/gpdata/gpdatap3 /data1/gpdata/gpdatap4 /data1/gpdata/gpdatap5 /data1/gpdata/gpdatap6)
TRUSTED_SHELL=ssh
declare -a MIRROR_DATA_DIRECTORY=(/data2/gpdata/gpdatam1 /data2/gpdata/gpdatam2 /data2/gpdata/gpdatam3 /data2/gpdata/gpdatam4 /data2/gpdata/gpdatam5 /data2/gpdata/gpdatam6)
MACHINE_LIST_FILE=/usr/local/greenplum-db/conf/seg_hosts

5. Issue Checklist

Common pitfalls discovered during deployment:

Insufficient /etc/security/limits.conf open‑file limit (warning: 51200 < 65535).

Network interconnect errors caused by missing firewall rules; both INPUT and OUTPUT must allow traffic:

-A INPUT -p all -s xxxxx -j ACCEPT
-A OUTPUT -p all -s xxxxx -j ACCEPT

Incorrect /etc/hosts entry for 127.0.0.1 on mixed Master/Standby/Segment setups caused query hangs; fixing the entry resolved the issue.

6. Fault Recovery Tests

Two scenarios were exercised:

Node crash recovery – restart the server, then:

select * from gp_segment_configuration where status!='u';
gprecoverseg -o ./recov;
gprecoverseg -r;
select * from gp_segment_configuration where status='u';

Node permanently unavailable – rebuild data directories and run:

gprecoverseg -F;
select * from gp_segment_configuration where status='u' and role!=preferred_role;
gprecoverseg -r;

Recovery of ~50 GB of data took roughly 3 minutes.

7. Optimization Review

Architecture optimization – after testing, the mixed deployment (Scheme 2) was chosen as the best balance of performance and resource usage.

SQL performance analysis – example of a slow plan vs. an efficient plan for SELECT count(*) FROM customer;:

# explain analyze select count(*) from customer;
--- slow plan (≈24 s) ---
Aggregate (cost=0.00..431.00 rows=1 width=8) (actual time=24792.916..24792.916 rows=1 loops=1)
  -> Gather Motion 36:1 (slice1; segments: 36) (actual time=3.255..16489.394 rows=150000000 loops=1)
       -> Seq Scan on customer (actual time=0.780..1267.878 rows=4172607 loops=1)
--- fast plan (≈1.5 s) ---
Aggregate (cost=0.00..842.08 rows=1 width=8) (actual time=1519.311..1519.311 rows=1 loops=1)
  -> Gather Motion 36:1 (slice1; segments: 36) (actual time=634.787..1519.214 rows=36 loops=1)
       -> Aggregate (actual time=1473.296..1473.296 rows=1 loops=1)
            -> Seq Scan on customer (actual time=0.758..438.319 rows=4172607 loops=1)

The root cause was stale statistics after loading 1 TB of data; running ANALYZE customer; refreshed the stats and restored performance.

Additional configuration tuning:

gpconfig -c statement_mem -m 2457600 -v 2457600
gpconfig -c gp_vmem_protect_limit -m 32000 -v 32000

8. Performance Results

TPCH benchmark timings for different cluster sizes:

10 physical nodes (6 Primary + 6 Mirror × 10 + 2):

count(*) from customer: 1.24 s
count(*) from lineitem: 10.66 s

6 physical nodes (6 × 6):

customer: 1.35 s
lineitem: 18.15 s

4 physical nodes (6 × 4):

customer: 1.53 s
lineitem: 25.07 s

Overall TPCH query set (19 queries) showed that the mixed deployment achieved near‑expected performance on 1 TB data, while other schemes suffered from network and I/O bottlenecks.

TPCH performance comparison
TPCH performance comparison
1 TB TPCH benchmark results
1 TB TPCH benchmark results
LinuxData WarehouseCluster DeploymentGreenplumTPCH
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.