Building a Reliable Geo‑Active Dual‑Active Architecture for Massive Online Games
This article details a two‑stage approach for creating a geo‑distributed active‑active infrastructure for large‑scale games, covering a pseudo active‑active design with private lines and smart DNS, followed by a true active‑active solution using Redis Sentinel and MySQL clustering with performance comparisons.
Stage 1: Pseudo Active‑Active
To protect a large‑world game from data‑center‑level failures (e.g., DDoS or backbone instability), a private network link was installed between northern and southern data centers. Entry nodes are deployed in both locations. A reverse proxy in the south forwards all traffic to the north over the private line and automatically falls back to the public Internet if the line fails. Smart DNS resolves the player’s domain to the nearest region (province‑level). A monitoring service (D‑monitor) watches DNS resolution and switches it to the healthy side on failure.
Private line reduces latency by ~10 ms compared with public routing.
Reverse proxy provides automatic failover from private line to public Internet.
Players connect via domain:port; DNS returns the nearest entry node.
D‑monitor detects DNS resolution failures and triggers automatic redirection.
Stage 2: Ideal Active‑Active
The goal is true active‑active operation with strong consistency and multi‑write capability for both cache and persistent data.
Cache layer: Redis Cluster with Sentinel.
Persistent layer: MySQL clustering.
Two MySQL clustering solutions were evaluated:
MySQL Cluster (NDB engine)
Percona XtraDB Cluster (PXC)
Traditional MySQL Replication and Group Replication were rejected because they rely on binlog replication, which introduces latency and cannot guarantee real‑time multi‑write consistency.
Performance tests under high concurrency measured write throughput, average latency, and data consistency.
MySQL Cluster delivered higher insert throughput and lower latency at large scale due to its in‑memory NDB engine, but it consumes large memory, is complex to manage, and is not recommended for production by Oracle.
Percona XtraDB Cluster, when paired with SSD storage, provided acceptable performance with simpler maintenance, making it a better fit for the game’s workload.
Second‑Stage Architecture
Building on the pseudo active‑active design, the following enhancements were added:
PXC cluster connects via the private line and automatically falls back to public communication if the line fails.
Non‑cache data is kept consistent through the PXC cluster.
Game nodes are deployed in both north and south data centers; any node failure triggers automatic migration.
Smart DNS with province‑level resolution directs players to the nearest node.
D‑monitor provides automatic failover for entry nodes.
The reverse proxy now only handles battle‑node allocation; full traffic forwarding is no longer required because battle nodes are geographically isolated.
References
https://dev.mysql.com/doc/refman/5.7/en/mysql-cluster-install-linux-rpm.html
https://dev.mysql.com/doc/refman/5.7/en/mysql-cluster-ndbd-definition.html
https://www.percona.com/doc/percona-xtradb-cluster/5.7/index.html
http://galeracluster.com/documentation-webpages/mysqlwsrepoptions.html
http://www.cnblogs.com/lizhi221/p/7325401.html
http://www.cnblogs.com/52php/p/5675374.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
