Backend Development 20 min read

High‑Availability Architecture for a Membership System: Elasticsearch Dual‑Center, Redis Caching, MySQL Migration and Fine‑Grained Flow Control

This article details the design and implementation of a high‑availability membership system, covering Elasticsearch dual‑center master‑slave clusters, traffic‑isolated three‑cluster ES architecture, Redis multi‑center caching, MySQL dual‑center partitioning, data migration strategies, and refined flow‑control and degradation mechanisms to ensure stable, low‑latency service under massive concurrent load.

IT Architects Alliance

Mar 22, 2022

High‑Availability Architecture for a Membership System: Elasticsearch Dual‑Center, Redis Caching, MySQL Migration and Fine‑Grained Flow Control

Background

The membership system is a core service that supports all business lines; any failure blocks order placement across the company. After the merger of Tongcheng and eLong, the system must handle cross‑platform member queries and traffic spikes exceeding 20,000 TPS during holidays.

1. Elasticsearch High‑Availability Solution

We deploy a dual‑center ES master‑slave cluster: the primary cluster in Data Center A and the standby cluster in Data Center B. Data is synchronized via MQ, and read/write can be switched to the standby cluster instantly if the primary fails.

To isolate high‑TPS marketing traffic, we add a third ES cluster dedicated to flash‑sale requests, separating it from the main ES cluster.

2. ES Deep Optimization

Balance shard distribution to avoid hot nodes.

Set thread‑pool size to cpu_core * 3 / 2 + 1 to prevent CPU spikes.

Limit shard memory to ≤50 GB.

Remove unnecessary text fields, keep only keyword for member queries.

Prefer filter over query to skip relevance scoring.

Use routing keys to target specific shards.

These optimizations reduced CPU usage and improved query latency dramatically.

3. Redis Caching Scheme

We introduced a dual‑center multi‑cluster Redis architecture. Writes are performed to both data centers; reads are served locally. A 2‑second distributed lock prevents cache inconsistency caused by Elasticsearch’s near‑real‑time delay.

Cache hit rate exceeds 90 %, greatly relieving ES pressure.

4. High‑Availability MySQL Primary Store

We migrated from a single‑instance SqlServer to a dual‑center MySQL partitioned cluster (over 1,000 shards, 1 master + 3 slaves). Data is routed via DBRoute: writes go to the master in Data Center A, reads are local. The cluster sustains >20,000 TPS with ~10 ms latency.

Migration employed a “full sync + incremental sync + real‑time gray‑switch” strategy, using dual‑write, retry logic, and A/B traffic shading to ensure data consistency.

After MySQL is fully operational, we add an ES master‑slave backup; if MySQL or DAL components fail, reads/writes can fall back to ES and later resynchronize.

5. Abnormal Member Relationship Governance

We identified and fixed rare cases where member accounts became incorrectly linked, preventing cross‑account data leakage and ensuring correct order visibility.

6. Future: Fine‑Grained Flow Control and Degradation

We plan to implement hotspot throttling, per‑caller flow rules, and global traffic caps to protect the system from extreme spikes, as well as response‑time‑based and error‑rate‑based circuit‑breakers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems System Architecture Scalability Elasticsearch redis mysql

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.