How to Scale Billion-Order Databases with Sharding, Partitioning, and Cold Data Strategies
This article explains how to handle billions of orders by classifying hot and cold data, storing them in MySQL, Elasticsearch, and Hive, and applying MySQL sharding, table splitting, and combined database‑table strategies to improve scalability and performance.
Background
As order volume grows to billions over three months, a single‑database single‑table design can no longer meet performance and cost requirements.
Order Data Segmentation
Orders are divided into hot data (last 3 months) and two tiers of cold data: A (3‑12 months) and B (older than 1 year). Hot data requires fast, real‑time queries; cold data can tolerate slower access.
Hot data: stored in MySQL with sharding and table partitioning.
Cold data A: stored in Elasticsearch for relatively fast search.
Cold data B: archived in Hive for occasional offline analysis.
MySQL Sharding and Partitioning
Business‑Level Splitting
Initially many services share one database, but as the system grows, separating business domains (user, product, order, etc.) into different databases reduces contention and improves throughput.
Separate databases for each business domain distribute load across multiple machines.
Table Splitting Strategy
Use order_id as the shard key; create many tables (e.g., 100) and route rows by order_id % 100. If queries need to be based on user_id, the shard key must be changed accordingly.
CREATE TABLE `order` (
order_id BIGINT(11) NOT NULL,
...
);Example of modulo routing for table selection:
Database Splitting Strategy
Similar modulo routing can be applied at the database level: order_id % number_of_databases. Non‑numeric IDs can be hashed before modulo.
Combined Sharding Strategy
When both database and table sharding are used, a composite formula distributes rows across databases and tables.
Example with 10 databases each containing 100 tables: order_id = 1001 maps to database 1, table 2.
Overall Architecture
Read/write requests are split: writes follow the sharding rules; reads first determine whether the order belongs to hot or cold data based on the timestamp embedded in order_id. Cold data older than a year is queried from Hive.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
