JDHBase Multi‑Active Disaster Recovery: Replication, Auto‑Failover & Consistency
JDHBase, JD.com’s large‑scale KV store, powers billions of daily reads and writes across 7,000 nodes, and this article details its multi‑active, cross‑region architecture—including HBase replication fundamentals, Fox Manager routing, automatic failover policies, dynamic replication tuning, and serial replication to ensure strong consistency.
Introduction
JDHBase is JD.com’s online key‑value storage system that handles massive online traffic, reaching trillion‑level read/write requests during major sales events such as 11.11 and 6.18. The cluster now exceeds 7,000 nodes with a storage capacity of 90 PB, serving more than 700 business scenarios including orders, recommendations, user profiling, finance, logistics, and monitoring.
To guarantee uninterrupted service, JDHBase implements a geographically distributed multi‑active system. The following sections describe the design and practice of this system.
HBase Replication Basics
HBase uses a Log‑Structured Merge‑Tree (LSM) architecture. Write requests are first stored in an in‑memory Memstore and appended to a sequential Write‑Ahead Log (WAL) on HDFS, ensuring durability even after node failures.
Replication in HBase is WAL‑based. Each RegionServer runs a ReplicationSource thread that reads WAL entries, applies configured filters, and sends them via replicateWALEntry RPC to the backup cluster. The backup’s ReplicationSink thread converts received entries into put/delete operations and writes them in batches.
This asynchronous replication typically introduces a latency of a few seconds before the standby cluster receives the latest writes.
JDHBase Multi‑Active Architecture
The system consists of three main components: the Client, the JDHBase cluster, and the Fox Manager.
1. Fox Manager Configuration Center
Fox Manager maintains user and cluster metadata, providing configuration services to clients and administrators.
Policy Server – a stateless distributed service that handles external requests; persistence can be MySQL or Zookeeper; optionally includes a Rule Engine for automatic configuration adjustments based on cluster state.
Service Center – UI for administrators to manage configurations.
VIP Load Balance – presents a unified access address for Policy Servers and provides load balancing.
2. JDHBase Cluster
The cluster offers high‑throughput OLTP capabilities. For critical workloads, an active‑standby multi‑active setup is deployed.
Active Cluster – normal business runs here; data is asynchronously replicated to the standby cluster with second‑level delay.
Standby Cluster – activated when the active cluster experiences failures; it also asynchronously replicates its data back to the active side.
Replication between the two clusters ensures eventual consistency. In production, a multi‑cluster replication topology is built so that each cluster can act as both primary and backup for different services, forming a mesh of replication relationships.
3. Client
When a client starts, it first contacts Fox Manager to report user information. After authentication, Fox Manager returns the appropriate cluster connection details. The client then creates an HConnection to interact with the designated cluster.
Clients also receive configuration parameters (e.g., retry counts, timeouts) from Fox Manager, which can be adjusted per business needs or in extreme scenarios.
Metrics are embedded in the client SDK to monitor service availability from the client’s perspective. Heartbeats report client status and metrics back to Fox Manager, enabling real‑time configuration updates and automatic client‑side failover when availability drops.
Cluster Switching
Data routing in HBase involves three steps: (1) the client obtains the Zookeeper address of the target cluster and retrieves the META table location; (2) it queries the META table to get region information and caches it; (3) it accesses the appropriate region server for data operations.
JDHBase adds an extra step where the client first contacts Fox Manager for user authentication, cluster information, and client parameters. This allows seamless switching between active and standby clusters without client‑side changes.
Automatic Failover
Manual failover can take minutes, which is unacceptable for some online services. JDHBase therefore implements policy‑driven automatic failover that can react within seconds.
HMaster runs a status‑checking plugin that collects availability metrics and reports them via heartbeat to the Policy Server. The Policy Server stores policies in MySQL and can be horizontally scaled.
A Rule Engine, built on Raft for high availability, evaluates these metrics across multiple time windows and triggers failover actions when thresholds are breached. The engine can also apply temporary parameters for extreme cases.
Dynamic Parameters & Automatic Throttling
Replication can become a bottleneck under heavy write loads or hotspot traffic, leading to backlog. JDHBase exposes dynamic replication parameters that can be updated at runtime without restarting RegionServers.
When backlog exceeds defined thresholds, the system automatically adjusts replication parameters (typically 1‑2× the baseline) to accelerate data transfer. Additionally, Fox Manager can schedule bandwidth‑aware throttling during peak periods to avoid saturating inter‑data‑center links.
Serial Replication
Standard asynchronous replication may produce out‑of‑order writes when RegionServers move or fail, potentially causing permanent inconsistencies (e.g., a Delete arriving before the corresponding Put).
Serial Replication guarantees that the order of mutations in the primary cluster matches the order applied in the standby cluster. It uses Barriers and lastPushedSequenceId stored in Zookeeper. When a Region opens, a new Barrier (max SequenceId + 1) is recorded. During replication, each batch updates lastPushedSequenceId. A RegionServer will only push data if its lastPushedSequenceId is greater than or equal to the preceding Barrier, ensuring ordered delivery.
Summary and Outlook
Through continuous adoption of industry disaster‑recovery practices and internal innovations, JDHBase now achieves a 99.98 % SLA, providing stable and reliable storage for JD.com’s massive online services. Future work will focus on synchronous replication, eliminating Zookeeper dependencies, client‑side automatic switching, and reducing data redundancy to further improve reliability and efficiency.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
