Databases 11 min read

How JD Scaled Its One‑Yuan Grab Treasure System with Database Sharding and ES Aggregation

This article details JD's One‑Yuan Grab Treasure platform redesign, covering business growth drivers, database sharding estimation, hash‑plus‑range routing implementation, Elasticsearch aggregation, Canal‑based sync, historical data migration, and downgrade mechanisms to ensure high‑throughput, reliable order processing during massive sales events.

dbaplus Community
dbaplus Community
dbaplus Community
How JD Scaled Its One‑Yuan Grab Treasure System with Database Sharding and ES Aggregation

Background

One‑Yuan Grab Treasure is a JD virtual‑goods service where each product (item) is sold in multiple periods (phases). Each period has a fixed number of participants (e.g., 1000 people each paying 1 CNY). The system experienced rapid order growth, especially before the 618 promotion, leading to concerns about database capacity and latency.

Data Volume Estimation

Assuming 1 million orders per day → 3.6 billion orders per year. Each order record ≈500 bytes (10 columns, 50 bytes each). Annual storage ≈170 GB. With a 200 GB per‑node capacity, one additional node per year would satisfy raw storage, but actual capacity must also meet QPS and latency targets measured in load tests.

Sharding Design

Routing Strategy

Two classic sharding methods were considered:

Hash routing : hash(item_id) → database; evenly distributes rows but requires full data migration when a new node is added.

Range (interval) routing : range of period_id → table; supports incremental expansion but can create hotspot tables and adds a lookup step.

The final design combines both:

Hash the grab_item_id to select the target database.

Within the chosen database, use the grab_period_id range to select the specific table.

This hybrid approach keeps the DAO layer transparent to upper services while allowing dynamic addition of databases without immediate data migration.

Table Layout

Each database contains a set of tables named order_0, order_1, …, each covering a contiguous interval of grab_period_id. The interval size is configurable (e.g., 10 000 periods per table) and stored in a routing‑rule table that is cached in memory.

Aggregation and Synchronization

Aggregated queries (e.g., total participants per item) are served by Elasticsearch (ES). ES provides automatic shard scaling and a query‑only interface for the business layer.

Because ES does not guarantee strong consistency, a fallback aggregation table in MySQL stores recent aggregates (e.g., last 5 minutes) for read‑through when ES is delayed or unavailable.

Change Capture with Canal

Canal monitors the binlog of a replica MySQL instance. For each committed transaction it extracts rows that belong to the sharded order tables, builds a lightweight message {table, primary_key}, and publishes it to a message queue.

Consumers read the message, fetch the latest row from the replica, and update both ES and the MySQL aggregation table. This decouples the write path from the indexing path and enables retry on failure.

Historical Data Migration

The system originally ran on a single database. Migration to the sharded architecture proceeds in two phases:

Pre‑cutover : A batch job scans the legacy table, transforms each row to the target (db, table) based on the routing rules, and bulk‑loads it into the new shards. The job is idempotent and can be re‑run for verification.

Post‑cutover : After the cutover window, only incremental rows (inserted after the pre‑cutover snapshot) are migrated. This minimizes downtime.

Validation compares three dimensions—order_id, item_id, period_id—between source and target. Any mismatch triggers a re‑migration of the affected day.

Degradation Paths

If Canal falls behind or the replica becomes unavailable, a switch enables direct scanning of the primary database for the most recent hours and pushes the data to ES and the aggregation table.

If ES is down, the ES switch is turned off and queries fall back to the MySQL aggregation table, ensuring that aggregate results remain available.

Key Commands and Configuration Snippets

# Example of routing rule table (cached in memory)
CREATE TABLE routing_rule (
    db_index   INT NOT NULL,
    period_start BIGINT NOT NULL,
    period_end   BIGINT NOT NULL,
    table_name   VARCHAR(64) NOT NULL,
    PRIMARY KEY (db_index, period_start)
);

# Canal configuration (simplified)
instance.master.address = 10.0.1.100:3306
instance.filter.regex = ^order_.*$
instance.destination = order_sync

Conclusion

The combination of hash‑based database selection, range‑based table selection, Elasticsearch aggregation, Canal‑driven change capture, and a two‑stage migration process enabled the One‑Yuan Grab Treasure service to scale for massive traffic spikes while preserving data integrity and providing robust fallback mechanisms.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Data MigrationBackend Architecturedatabase shardingPerformance Scalingrouting strategy
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.