Databases 8 min read

Scaling Billions of Orders: MySQL Sharding, ES & Hive Strategies

This article explains how to handle massive order volumes by classifying data into hot and cold tiers, storing them in MySQL, Elasticsearch, and Hive, and implementing sharding and partitioning strategies—including shard keys, modulo routing, and combined database‑table distribution—to achieve high throughput and low cost.

Java High-Performance Architecture
Java High-Performance Architecture
Java High-Performance Architecture
Scaling Billions of Orders: MySQL Sharding, ES & Hive Strategies

Background

With daily order volumes exceeding ten million, a three‑month period can generate around one billion orders, making a single‑database, single‑table design insufficient and prompting a database redesign.

Order Data Classification

Order data is divided into three categories:

Hot data : orders from the past three months, requiring high‑frequency real‑time queries.

Cold data A : orders from three to twelve months ago, accessed less frequently.

Cold data B : orders older than one year, rarely queried and suitable for offline access.

Cold data is split into two groups because storing year‑old data in the primary DB incurs high cost and maintenance overhead; occasional access can be served via offline queries.

Storage plan:

Hot data – MySQL with sharding and table partitioning.

Cold data A – Elasticsearch, leveraging its search capabilities for faster queries.

Cold data B – Hive for batch‑oriented offline analysis.

MySQL Sharding and Partitioning

1. Business‑Based Splitting

Initially, many e‑commerce platforms place all modules (users, products, orders) in a single database with separate tables, which becomes hard to maintain as the system grows. The recommended approach is to separate different business domains into distinct databases, distributing load and improving throughput.

Single database with multiple tables
Single database with multiple tables
Multiple databases each handling a business domain
Multiple databases each handling a business domain

2. Table Partitioning Strategy

Using the order ID as a shard key, tables can be split by modulo operation (e.g., order_id % 100) to distribute rows across 100 sub‑tables. If queries need to be based on user_id, the shard key should be changed accordingly (e.g., user_id % 100).

3. Database Partitioning Strategy

Database partitioning follows a similar modulo routing, e.g., order_id % number_of_databases. For non‑integer keys, hash the value first: hash(order_id) % number_of_databases.

4. Combined Sharding and Partitioning

When both sharding and partitioning are needed, a two‑step calculation is used:

intermediate = shard_key % (db_count * tables_per_db)
db_index = floor(intermediate / tables_per_db)

For example, with 10 databases each containing 100 tables, an order_id of 1001 maps to database 1, table 2 (zero‑based indexing).

Routing calculation example
Routing calculation example

Overall Architecture Design

The system separates read and write paths. Write requests follow the sharding rules to store data in the appropriate MySQL instance. Read requests first determine whether the query targets hot or cold data based on the order ID timestamp; hot data is fetched from MySQL, cold data A from Elasticsearch, and cold data B from Hive.

A scheduled job periodically migrates cold data from MySQL to Elasticsearch and Hive, ensuring that older data is off‑loaded to cost‑effective storage.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ElasticsearchHivemysqldatabase scalingorder data
Java High-Performance Architecture
Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.