Scaling Billions of Orders: MySQL Sharding, ES & Hive Strategies
This article explains how to handle massive order volumes by classifying data into hot and cold tiers, storing them in MySQL, Elasticsearch, and Hive, and implementing sharding and partitioning strategies—including shard keys, modulo routing, and combined database‑table distribution—to achieve high throughput and low cost.
Background
With daily order volumes exceeding ten million, a three‑month period can generate around one billion orders, making a single‑database, single‑table design insufficient and prompting a database redesign.
Order Data Classification
Order data is divided into three categories:
Hot data : orders from the past three months, requiring high‑frequency real‑time queries.
Cold data A : orders from three to twelve months ago, accessed less frequently.
Cold data B : orders older than one year, rarely queried and suitable for offline access.
Cold data is split into two groups because storing year‑old data in the primary DB incurs high cost and maintenance overhead; occasional access can be served via offline queries.
Storage plan:
Hot data – MySQL with sharding and table partitioning.
Cold data A – Elasticsearch, leveraging its search capabilities for faster queries.
Cold data B – Hive for batch‑oriented offline analysis.
MySQL Sharding and Partitioning
1. Business‑Based Splitting
Initially, many e‑commerce platforms place all modules (users, products, orders) in a single database with separate tables, which becomes hard to maintain as the system grows. The recommended approach is to separate different business domains into distinct databases, distributing load and improving throughput.
2. Table Partitioning Strategy
Using the order ID as a shard key, tables can be split by modulo operation (e.g., order_id % 100) to distribute rows across 100 sub‑tables. If queries need to be based on user_id, the shard key should be changed accordingly (e.g., user_id % 100).
3. Database Partitioning Strategy
Database partitioning follows a similar modulo routing, e.g., order_id % number_of_databases. For non‑integer keys, hash the value first: hash(order_id) % number_of_databases.
4. Combined Sharding and Partitioning
When both sharding and partitioning are needed, a two‑step calculation is used:
intermediate = shard_key % (db_count * tables_per_db) db_index = floor(intermediate / tables_per_db)For example, with 10 databases each containing 100 tables, an order_id of 1001 maps to database 1, table 2 (zero‑based indexing).
Overall Architecture Design
The system separates read and write paths. Write requests follow the sharding rules to store data in the appropriate MySQL instance. Read requests first determine whether the query targets hot or cold data based on the order ID timestamp; hot data is fetched from MySQL, cold data A from Elasticsearch, and cold data B from Hive.
A scheduled job periodically migrates cold data from MySQL to Elasticsearch and Hive, ensuring that older data is off‑loaded to cost‑effective storage.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
