Database Sharding and Partitioning Strategy for High‑Volume Order Systems
The article explains how to handle billions of daily orders by classifying data into hot and cold segments, storing them in MySQL, Elasticsearch, and Hive, and applying sharding and partitioning techniques at both table and database levels to achieve scalable performance.
As the company’s order volume grows to billions of records over three months, a single‑database, single‑table design can no longer meet performance and cost requirements, prompting a comprehensive database redesign.
1. Background
The massive order flow requires a new architecture to handle scalability and cost.
2. Order Data Classification
Orders are divided into three categories:
Hot data : orders from the last three months, requiring high‑frequency, low‑latency queries.
Cold data A : orders from three to twelve months ago, accessed infrequently.
Cold data B : orders older than one year, rarely queried and suitable for offline access.
This separation reduces storage cost and simplifies maintenance.
3. Storage Plan for Each Data Type
Hot data – stored in MySQL with sharding and table partitioning.
Cold data A – stored in Elasticsearch to leverage its fast search capabilities.
Cold data B – stored in Hive for batch processing and archival.
4. MySQL Sharding and Table Partitioning
Initially, a monolithic schema with four tables (user, product, order, etc.) is shown. To improve scalability, each business domain is moved to its own database, distributing load across multiple instances.
Further splitting each database into multiple tables (e.g., 100 tables per database) is achieved by using the order_id % 100 modulo operation.
create table order(
order_id bigint(11),
...
);If queries need to be based on user_id instead, the shard key can be switched accordingly.
5. Database‑Level Sharding
Database routing follows a similar modulo strategy: order_id % number_of_databases . Non‑numeric keys can be hashed first, e.g., hash(order_id) % number_of_databases .
6. Combined Sharding Strategy
When both database and table sharding are used, an intermediate variable distributes records:
For example, with 10 databases each containing 100 tables, order_id = 1001 maps to database 1, table 2.
7. Overall Architecture Design
The system separates read and write paths. Writes follow the sharding rules directly. Reads first determine whether the request targets hot or cold data by parsing the timestamp embedded in order_id . Queries for data older than a year are directed to Hive for offline access.
A scheduled job periodically migrates cold data from MySQL to Elasticsearch and Hive, keeping each storage tier up‑to‑date.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.