Design and Evolution of JD Elastic Database: Architecture, Sharding, and Automatic Failover
This article details the evolution of JD's Elastic Database, describing the challenges of scaling MySQL, the staged solutions including sharding, JProxy, and the final elastic architecture with services like Topology, JED‑Gate, and JED‑Tablet, and explains its query processing, dynamic resharding, and automatic failover mechanisms.
Lv Xin, a senior architect at JD, introduces his background in data product development and the motivation behind building JD Elastic Database.
The discussion begins with the dominance of MySQL in internet companies and the recurring, stage‑specific problems encountered as data volume grows.
First stage: Single MySQL instances cannot handle increasing data, leading to the adoption of database sharding (分库分表). This introduces additional complexity in data routing and code changes during scaling.
Second stage: To decouple data routing from business logic, JD developed JProxy, which solves the tight coupling but still suffers from offline resharding and service downtime.
Third stage: JD Elastic Database is created on the JDOS platform, providing online dynamic resharding, automatic failover, and data recovery capabilities.
The elastic database consists of five core services: Topology, JED‑Gate, JED‑Tablet, JED‑ctl, and JED‑ctld. Topology manages metadata for clusters, shards, and routing. JED‑ctl is a command‑line tool for metadata operations; JED‑ctld offers the same functionality via an HTTP API used by other services.
JED‑Gate acts as a lightweight proxy that routes queries to the appropriate JED‑Tablet, which sits alongside each MySQL instance. Each shard contains three POD types: Master, Replica‑Slave, and ReadOnly‑Slave, selected based on the query‑user suffix.
The data model includes three concepts: KeySpace (logical database), KeyRange (horizontal partition within a KeySpace), and Shard (the physical storage unit). Each KeyRange maps to a Shard, which hosts the three POD types.
Query processing flow: an application connects to JED‑Gate via the MySQL driver, JED‑Gate retrieves user and keyspace information, consults Topology via JED‑ctld to locate the relevant shard, determines the correct POD based on user suffix, forwards the query, and streams results back to the client.
Dynamic resharding is performed in four steps using JED‑ctl: (1) create target shards and configure replication, (2) copy table structures, (3) initialize data via filtered replication (including ReadOnly‑Slave handling), and (4) stop writes on the source master, synchronize data, update topology, and decommission the source shard.
Automatic failover leverages Orchestrator: when a master fails, a Replica‑Slave is promoted, ReadOnly‑Slaves are re‑attached, and topology metadata is updated. Manual steps are required to replenish the lost replica.
JD Elastic Database now supports 11 core domestic services and nearly 300 overseas services, offering elastic resource management, online resharding, HA, automated backups, operational automation, and audit capabilities.
Recommended reading includes "Presto技术内幕", "编程珠玑", "TCP/IP详解", and "MySQL技术内幕".
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.