Databases 9 min read

Database Sharding and Partitioning Strategy for High‑Volume Order Systems

The article explains how to handle billions of daily orders by classifying data into hot and cold segments, storing them in MySQL, Elasticsearch, and Hive, and applying sharding and partitioning techniques at both table and database levels to achieve scalable performance.

Architect

Dec 30, 2022

Database Sharding and Partitioning Strategy for High‑Volume Order Systems

As the company’s order volume grows to billions of records over three months, a single‑database, single‑table design can no longer meet performance and cost requirements, prompting a comprehensive database redesign.

1. Background

The massive order flow requires a new architecture to handle scalability and cost.

2. Order Data Classification

Orders are divided into three categories:

Hot data : orders from the last three months, requiring high‑frequency, low‑latency queries.

Cold data A : orders from three to twelve months ago, accessed infrequently.

Cold data B : orders older than one year, rarely queried and suitable for offline access.

This separation reduces storage cost and simplifies maintenance.

3. Storage Plan for Each Data Type

Hot data – stored in MySQL with sharding and table partitioning.

Cold data A – stored in Elasticsearch to leverage its fast search capabilities.

Cold data B – stored in Hive for batch processing and archival.

4. MySQL Sharding and Table Partitioning

Initially, a monolithic schema with four tables (user, product, order, etc.) is shown. To improve scalability, each business domain is moved to its own database, distributing load across multiple instances.

Further splitting each database into multiple tables (e.g., 100 tables per database) is achieved by using the order_id % 100 modulo operation.

create table order(
  order_id bigint(11),
  ...
);

If queries need to be based on user_id instead, the shard key can be switched accordingly.

5. Database‑Level Sharding

Database routing follows a similar modulo strategy: order_id % number_of_databases. Non‑numeric keys can be hashed first, e.g., hash(order_id) % number_of_databases.

6. Combined Sharding Strategy

When both database and table sharding are used, an intermediate variable distributes records:

For example, with 10 databases each containing 100 tables, order_id = 1001 maps to database 1, table 2.

7. Overall Architecture Design

The system separates read and write paths. Writes follow the sharding rules directly. Reads first determine whether the request targets hot or cold data by parsing the timestamp embedded in order_id. Queries for data older than a year are directed to Hive for offline access.

A scheduled job periodically migrates cold data from MySQL to Elasticsearch and Hive, keeping each storage tier up‑to‑date.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch Hive MySQL database sharding Data Partitioning

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.