Databases 18 min read

Design and Implementation of a DB2 pureScale GDPC Dual‑Active Database Platform

This article presents a comprehensive technical guide on building a dual‑active DB2 pureScale GDPC solution, covering the shortcomings of traditional disaster‑recovery, the rationale for choosing DB2 pureScale, architectural principles, site and network design, storage, resource allocation, client access, benefits, and remaining challenges.

Architects' Tech Alliance

Nov 4, 2019

Design and Implementation of a DB2 pureScale GDPC Dual‑Active Database Platform

Author: Kong Zaihua, Database Architect, experienced in DB2 PureScale cluster consulting and performance tuning.

During the construction of a two‑site, three‑center architecture, three major problems were encountered with traditional disaster‑recovery techniques: excessive switchover time, high operational risk due to many manual steps, and prohibitive cost because of idle servers.

To address these issues, a dual‑active platform was proposed to reduce RTO, lower cost, and minimize switch‑over risk.

Given the requirements—traditional (non‑Internet) workloads, base databases DB2 and Oracle, no changes to application code, and the need for equal‑level dual‑active data centers—the DB2 pureScale GDPC solution was identified as the most suitable.

Why base the dual‑active solution on DB2 pureScale?

The goal is zero data loss across multiple centers, reduced switchover time, minimized human error, and lower cost. Only DB2 pureScale and Oracle RAC can meet these criteria, but DB2 GDPC (geographically dispersed pureScale) offers true equal‑level dual‑active capabilities, better vendor support, and superior scalability and transparency compared with Oracle RAC.

Key design principles for the DB2 pureScale dual‑active solution

1. Generality: based on LUW open platform, deployable on any vendor’s storage, servers, and OS; avoid proprietary appliances. 2. Equality: both data centers handle transactions equally, with no primary‑secondary distinction. 3. High availability: minimize intra‑city switchover time; failures in one site should not affect the other. 4. Maintainability: allow non‑stop major changes via rolling upgrades. 5. Migratability: transparent to applications; no code changes required for deployment or migration. 6. Stability: aim for five‑nines availability.

Overall Design – Site and Arbitration Node Placement

The GDPC topology requires three sites: two active data‑center sites and one arbitration site.

Active sites requirements:

Reliable TCP/IP links with RDMA (RoCE or InfiniBand); distance ≤50 km (up to 70–80 km under low load).

Each site has a CF and an equal number of member nodes, sharing a single IP subnet.

Dedicated local SAN controllers with mirrored LUNs across sites; GPFS used for synchronous replication.

RDMA network with redundant RoCE adapters (or single‑port InfiniBand) and separate private TCP/IP VLAN for GPFS heartbeat.

Arbitration site requirements:

One non‑member host (can be a VM) dedicated to cluster arbitration; no SAN access needed.

No RDMA or private network required.

Provides arbitration disks (50‑100 MB each) for each shared file system, stored on local physical disks or LVs.

LV creation guidelines: create logical volumes within a volume group, allocate at least one physical disk per VG, use two physical disks for redundancy when possible.

Communication Network Design

Only the arbitration site needs basic TCP/IP. For the active sites:

DWDM links between sites should be redundant and use different carriers.

Ethernet external service interfaces should be dual‑NIC with active‑standby bonding; switches also redundant.

RoCE NICs should have two ports connected to separate switches; ports are not bonded. All RoCE interfaces reside in a single VLAN.

A private VLAN for GPFS heartbeat and data traffic should also be dual‑port bonded and share the same VLAN as RoCE.

Shared Storage Design

GPFS provides two consistent replicas (redundant groups 1 and 2) for each file system, storing both data and metadata. The arbitration host only needs access to the arbitration disks, not the full storage arrays.

Resource Design

CF and member node resources are primarily CPU, memory, and RDMA NICs. Recommended sizing:

Member CPU: roughly 2× the single‑node requirement for comparable workload.

CF CPU: one RDMA port per 6–8 CPU cores (IBM recommendation).

Member memory: about twice the memory of a comparable single‑node database to accommodate larger locklists.

CF memory: sized for GBP (LOCAL BUFFERPOOL pages × 1.25 KB × member count) plus GLM (total locklist).

Client Access Design

Use client affinity to connect application servers to preferred member nodes, avoiding cross‑site traffic. If a member fails, ACR automatically redirects connections to another live member. Additional recommendations include placing batch nodes in the same room as the primary CF and starting the first member on the same site as the primary CF.

Benefits and Limitations

The dual‑active DB2 pureScale platform has been in production for three years, supporting six high‑priority systems with verified high availability and maintainability. However, the need for zero data loss and geographic distance introduces latency that can degrade performance, especially for batch workloads.

Remaining challenges include hotspot mitigation (despite partitioning and random indexing) and further reduction of inter‑node communication latency and concurrency bottlenecks.

Continuous improvements in these areas will expand the applicability of the solution.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Dual-Active DB2 GDPC pureScale

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.