Database Sharding and Partitioning Strategy for High‑Volume Order Systems
This article explains how to classify massive order data into hot and cold segments, store them in MySQL, Elasticsearch and Hive respectively, and implement sharding and partitioning at both table and database levels using modulo and hash calculations to achieve scalable performance for billions of orders.
Background: With the company processing over ten million orders per day, the projected three‑month total reaches about one billion, making the original single‑database, single‑table design insufficient and prompting a database redesign.
Data Classification: Order data is divided into three categories—hot data (orders within the last 3 months, requiring high real‑time query performance), cold data A (orders from 3 to 12 months ago, queried infrequently), and cold data B (orders older than one year, rarely accessed and suitable for offline retrieval).
Storage Plan: Hot data will be stored in MySQL with sharding and table partitioning; cold data A will be stored in Elasticsearch to leverage its fast search capabilities; cold data B will be archived in Hive for batch processing.
MySQL Sharding and Partitioning:
Business‑level splitting: Separate business modules (users, products, orders, etc.) into different databases to distribute load.
Table partitioning: Use the order_id as the shard key and apply modulo (e.g., order_id % 100 ) to determine the target sub‑table.
Database splitting: Route data to different databases using a similar modulo or hash strategy (e.g., order_id % number_of_databases or hash(order_id) % number_of_databases ).
Combined sharding: When both database and table splitting are needed, compute an intermediate variable: shard_key % (db_count * tables_per_db) , then derive the database index and table index from it.
Implementation Example: Assuming 10 databases each containing 100 tables, an order_id of 1001 would be routed to database 1, table 2 (using zero‑based indexing).
Overall Architecture Design: The system separates read and write requests. Write operations directly follow the sharding rules to store data in the appropriate MySQL instance. Read operations first determine whether the requested order belongs to hot or cold data based on the timestamp embedded in the order_id; hot data is fetched from MySQL, cold data A from Elasticsearch, and cold data B from Hive. A scheduled job periodically migrates aged data from MySQL to Elasticsearch and Hive.
Images illustrating the database layout, sharding formulas, and overall architecture are included in the original document.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.