Databases 19 min read

Scaling an E‑commerce Order System with Sharding, ES‑HBase Search, and Zero‑Downtime Migration

This article details how a high‑traffic e‑commerce platform migrated from a single MySQL instance to a sharded architecture using Sharding‑JDBC, added Elasticsearch‑HBase for multi‑dimensional queries, and implemented zero‑downtime data migration and scaling strategies.

dbaplus Community

Oct 19, 2020

Scaling an E‑commerce Order System with Sharding, ES‑HBase Search, and Zero‑Downtime Migration

Background : An e‑commerce company grew from 300k to 1M daily orders, reaching 2000 TPS on MySQL. The monolithic database could no longer sustain the load, prompting a rapid redesign.

Why Sharding?

Performance bottlenecks appear as I/O or CPU saturation, leading to excessive active connections and service outages. After exhausting code, SQL, and index optimizations, horizontal scaling via database sharding becomes necessary.

IO bottleneck : Hot data exceeds cache, causing heavy disk reads. Solution: master‑slave replication with read‑only slaves or sharding + horizontal partitioning (e.g., split order table by user_id).

Write IO bottleneck : Frequent writes overload disk. Solution: multiple master databases (sharding) and horizontal partitioning to reduce per‑table size and index cost.

CPU bottleneck : Complex SQL (joins, GROUP BY, ORDER BY on non‑indexed columns) or very large tables (>100M rows) increase CPU usage. Horizontal sharding mitigates this.

Sharding方案

Two main approaches were evaluated:

Proxy middleware (MyCat, KingShard) – low code impact but adds a black‑box layer and a single point of failure.

Lightweight Java libraries (Sharding‑JDBC, TSharding) – requires code changes but offers transparency and easier debugging.

The team chose Sharding‑JDBC for its lower operational risk.

Design Decisions

The target was to support 10 M daily orders for the next two years (≈1250 TPS per shard). Sixteen databases were created, each with 16 tables, yielding an average of 4.56 B rows per database and ~28.5 M rows per table – a manageable size.

Sharding key: user_id (hash‑based) to keep a user's orders in the same database/table, optimizing read performance. Order IDs embed the user_id to enable routing from the ID alone.

Hash modulo 16 aligns with power‑of‑two sizing, allowing bitwise AND for fast shard calculation.

Management‑Side Query Solution

To support multi‑dimensional queries (e.g., orders of a specific product on a given day), the team decoupled indexing from storage using Elasticsearch for search and HBase for raw data. Query flow: search ES for matching rowkey s → fetch rows from HBase (near‑zero latency).

Images illustrate the ES‑HBase architecture.

Real‑Time Sync

MySQL order data is captured via Canal, pushed to RocketMQ, and consumed to write into HBase and update ES indexes. Canal mimics a MySQL slave, reads binlog, and processes incremental changes.

Zero‑Downtime Migration

Three migration patterns were considered:

Replica promotion : build a slave, sync, then promote. Not suitable due to high concurrency and schema changes.

Dual‑write (双写) : write to old and new databases simultaneously, then migrate historic data in batches, followed by gradual traffic shift.

Data‑sync tools (Canal, DataBus) : use CDC to replicate changes, avoiding dual‑write code changes.

The final plan combined Canal for incremental sync with scripted batch migration of historic data, followed by a phased read‑only switch (双读) and eventual cut‑over.

Scaling & Shrinking

When capacity changes, re‑hash data into the new shard set using the same migration pipeline.

Improved Sharding Strategy

Hash‑modulo can cause uneven distribution; the article suggests moving to consistent hashing with virtual nodes for better balance and easier scaling.

Degradation Strategy

During peak sales, the system can fall back to asynchronous order processing: orders are queued, briefly cached in Redis, and batch‑written to MySQL, reducing write pressure at the cost of temporary data staleness.

Overall, the article provides a comprehensive roadmap for sharding, search integration, real‑time sync, and zero‑downtime migration in a high‑throughput e‑commerce environment.

Elasticsearch MySQL HBase Canal zero‑downtime migration high‑traffic e‑commerce

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.