Operations 20 min read

How to Build a True Dual‑Active Data Center: Architecture, Technologies, and Key Pitfalls

This article explains why dual‑active data centers are needed, outlines the evolution from primary‑backup to active‑active designs, and provides detailed guidance on implementing dual‑active solutions across the data, storage, application, and virtualization layers, including critical networking and performance considerations.

dbaplus Community
dbaplus Community
dbaplus Community
How to Build a True Dual‑Active Data Center: Architecture, Technologies, and Key Pitfalls

Why Dual‑Active Data Centers?

Dual‑active architectures keep both sites serving traffic simultaneously, eliminating the long fail‑over windows of traditional primary‑backup designs. Continuous service requires coordinated design across data, storage, access, virtualization, and networking layers.

1. Data Layer

Traditional databases can be made dual‑active using three main approaches:

Active‑Standby (Oracle ADG) : Redo or archived logs are shipped from the primary to a standby database, which can be opened read‑only for queries. On primary failure the standby is promoted to production.

Active‑Active (Oracle RAC / Extended RAC) : Multiple nodes read and write concurrently to a shared storage pool, providing seamless fail‑over without manual intervention.

Logical Replication (GoldenGate, DSG, etc.) : Transaction‑log changes are captured and applied to a remote database. Supports one‑to‑one, one‑to‑many, many‑to‑one and bidirectional topologies, with table‑level granularity and read‑write capability on both sites.

Key implementation notes:

Oracle ADG works over a network, supports heterogeneous storage, and can be used for emergency or DR purposes.

Logical replication requires primary keys on source tables, can be compressed, and benefits from tuned extract/replicat parameters.

Extended RAC relies on shared storage (e.g., Oracle ASM) and Clusterware to allow parallel access across sites.

Memory‑resident databases such as Oracle TimesTen or Altibase can also be deployed in active‑active mode, delivering sub‑millisecond latency for real‑time billing or read‑write‑separation scenarios.

2. Storage Layer

Dual‑active storage is the backbone of the architecture. Three families are commonly used:

Remote Volume Management (e.g., Symantec Storage Foundation, IBM GPFS, Oracle ASM): Logical volume mirroring across sites.

Storage‑Gateway Virtualization (e.g., EMC VPLEX, IBM SVC): Virtual gateways in each data center present a unified LUN to hosts while handling cross‑site replication.

Volume‑Mirror Technology : Two disk arrays are clustered and exposed as a single virtual volume.

Design considerations include dual LUN provisioning, low‑latency DWDM fiber links, ASM disk‑group configuration with fail‑over groups, a third‑site arbitration node (often using NFS), and continuous monitoring of inter‑site links.

3. Access / Application Layer

To expose services from both sites, the following techniques are typical:

Global Server Load Balancing (e.g., F5 GTM) combined with DNS to direct clients to the optimal site.

Intra‑network load balancers (SLB) for internal services, providing automatic fail‑over.

Front‑end CDN or edge caching to distribute traffic across regions.

Application clusters should be deployed either as independent clusters in each center or as a single cross‑center cluster with a unified database access interface and automatic reconnection logic to avoid manual switchover.

4. Virtualization & Cloud Platform

Virtualization introduces additional constraints. Four patterns are commonly adopted:

Traditional load‑balanced active‑active clusters on each site.

Distributed coordination (e.g., ZooKeeper) to build a single cross‑center cluster.

Big‑data platforms (Hadoop, MPP) using dual‑write or data‑replica mechanisms.

Virtualization platform dual‑active (e.g., VMware vSphere HA/DRS) with cross‑site storage replication.

Recommended practices:

10 GbE heartbeat links between sites.

Dedicated paths for vMotion traffic.

Configure ESXi clusters for HA and DRS, ensuring compatible hardware and firmware.

5. Critical Technical Points

5.1 Large‑Scale L2 Interconnect

Extending a flat L2 fabric across data centers can be achieved with one of the following:

EVN/OTV (MAC‑in‑IP) to create a stretched VLAN.

Direct fiber links with link aggregation and storm‑control.

MPLS‑based VPLS VPNs.

Overlay networks (VXLAN) that encapsulate VLAN traffic over an underlay.

5.2 GoldenGate Performance Tuning

Extract process – split per schema, tune eofdelay and flushsecs, increase I/O buffer intervals.

Pump process – ensure source tables have primary keys, enable compression, enlarge TCP buffers, adjust queue read intervals (≈3 s) and memory flush intervals (≈5 s).

Replicat process – merge small transactions, increase maxtransops for large batches, and consider process partitioning by table or range.

5.3 Oracle ADG Observations

Tested with an 11 GB database on 40 GB storage over a 1 GbE link. Average redo‑log bandwidth was 16 MB/s, peak 52 MB/s. Coordination of RAC and GPFS arbitration timers is essential to avoid split‑brain scenarios.

5.4 Split‑Brain Prevention

Design redundant network paths, SAN links, and a third‑site arbitration node. Ensure lower‑layer heartbeats have longer time‑outs than higher‑layer ones so that storage arbitration precedes database arbitration during a link failure.

5.5 Comprehensive Testing

Simulate failures across network, storage, and compute layers. Verify Recovery Point Objective (RPO) and Recovery Time Objective (RTO) targets for each failure mode.

Conclusion

Implementing a dual‑active data‑center model requires coordinated design of five layers—data, storage, access, virtualization, and networking—plus careful attention to performance bottlenecks, arbitration mechanisms, split‑brain avoidance, and exhaustive testing to achieve true zero‑downtime service.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Dual-ActiveData centernetwork designStorage Virtualization
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.