How to Build a True Dual‑Active Data Center: Architecture, Technologies, and Key Pitfalls
This article explains why dual‑active data centers are needed, outlines the evolution from primary‑backup to active‑active designs, and provides detailed guidance on implementing dual‑active solutions across the data, storage, application, and virtualization layers, including critical networking and performance considerations.
Why Dual‑Active Data Centers?
Dual‑active architectures keep both sites serving traffic simultaneously, eliminating the long fail‑over windows of traditional primary‑backup designs. Continuous service requires coordinated design across data, storage, access, virtualization, and networking layers.
1. Data Layer
Traditional databases can be made dual‑active using three main approaches:
Active‑Standby (Oracle ADG) : Redo or archived logs are shipped from the primary to a standby database, which can be opened read‑only for queries. On primary failure the standby is promoted to production.
Active‑Active (Oracle RAC / Extended RAC) : Multiple nodes read and write concurrently to a shared storage pool, providing seamless fail‑over without manual intervention.
Logical Replication (GoldenGate, DSG, etc.) : Transaction‑log changes are captured and applied to a remote database. Supports one‑to‑one, one‑to‑many, many‑to‑one and bidirectional topologies, with table‑level granularity and read‑write capability on both sites.
Key implementation notes:
Oracle ADG works over a network, supports heterogeneous storage, and can be used for emergency or DR purposes.
Logical replication requires primary keys on source tables, can be compressed, and benefits from tuned extract/replicat parameters.
Extended RAC relies on shared storage (e.g., Oracle ASM) and Clusterware to allow parallel access across sites.
Memory‑resident databases such as Oracle TimesTen or Altibase can also be deployed in active‑active mode, delivering sub‑millisecond latency for real‑time billing or read‑write‑separation scenarios.
2. Storage Layer
Dual‑active storage is the backbone of the architecture. Three families are commonly used:
Remote Volume Management (e.g., Symantec Storage Foundation, IBM GPFS, Oracle ASM): Logical volume mirroring across sites.
Storage‑Gateway Virtualization (e.g., EMC VPLEX, IBM SVC): Virtual gateways in each data center present a unified LUN to hosts while handling cross‑site replication.
Volume‑Mirror Technology : Two disk arrays are clustered and exposed as a single virtual volume.
Design considerations include dual LUN provisioning, low‑latency DWDM fiber links, ASM disk‑group configuration with fail‑over groups, a third‑site arbitration node (often using NFS), and continuous monitoring of inter‑site links.
3. Access / Application Layer
To expose services from both sites, the following techniques are typical:
Global Server Load Balancing (e.g., F5 GTM) combined with DNS to direct clients to the optimal site.
Intra‑network load balancers (SLB) for internal services, providing automatic fail‑over.
Front‑end CDN or edge caching to distribute traffic across regions.
Application clusters should be deployed either as independent clusters in each center or as a single cross‑center cluster with a unified database access interface and automatic reconnection logic to avoid manual switchover.
4. Virtualization & Cloud Platform
Virtualization introduces additional constraints. Four patterns are commonly adopted:
Traditional load‑balanced active‑active clusters on each site.
Distributed coordination (e.g., ZooKeeper) to build a single cross‑center cluster.
Big‑data platforms (Hadoop, MPP) using dual‑write or data‑replica mechanisms.
Virtualization platform dual‑active (e.g., VMware vSphere HA/DRS) with cross‑site storage replication.
Recommended practices:
10 GbE heartbeat links between sites.
Dedicated paths for vMotion traffic.
Configure ESXi clusters for HA and DRS, ensuring compatible hardware and firmware.
5. Critical Technical Points
5.1 Large‑Scale L2 Interconnect
Extending a flat L2 fabric across data centers can be achieved with one of the following:
EVN/OTV (MAC‑in‑IP) to create a stretched VLAN.
Direct fiber links with link aggregation and storm‑control.
MPLS‑based VPLS VPNs.
Overlay networks (VXLAN) that encapsulate VLAN traffic over an underlay.
5.2 GoldenGate Performance Tuning
Extract process – split per schema, tune eofdelay and flushsecs, increase I/O buffer intervals.
Pump process – ensure source tables have primary keys, enable compression, enlarge TCP buffers, adjust queue read intervals (≈3 s) and memory flush intervals (≈5 s).
Replicat process – merge small transactions, increase maxtransops for large batches, and consider process partitioning by table or range.
5.3 Oracle ADG Observations
Tested with an 11 GB database on 40 GB storage over a 1 GbE link. Average redo‑log bandwidth was 16 MB/s, peak 52 MB/s. Coordination of RAC and GPFS arbitration timers is essential to avoid split‑brain scenarios.
5.4 Split‑Brain Prevention
Design redundant network paths, SAN links, and a third‑site arbitration node. Ensure lower‑layer heartbeats have longer time‑outs than higher‑layer ones so that storage arbitration precedes database arbitration during a link failure.
5.5 Comprehensive Testing
Simulate failures across network, storage, and compute layers. Verify Recovery Point Objective (RPO) and Recovery Time Objective (RTO) targets for each failure mode.
Conclusion
Implementing a dual‑active data‑center model requires coordinated design of five layers—data, storage, access, virtualization, and networking—plus careful attention to performance bottlenecks, arbitration mechanisms, split‑brain avoidance, and exhaustive testing to achieve true zero‑downtime service.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
