How to Scale 1 Billion Orders with MySQL Sharding: Strategies, Gene IDs, and Migration
This article explores the performance bottlenecks of a 700‑million‑row MySQL orders table and presents a comprehensive sharding solution—including vertical and horizontal partitioning, gene‑based shard keys, routing logic, global secondary indexes, data migration strategies, and common pitfalls—demonstrating how query latency can drop from seconds to milliseconds while supporting billions of records.
Scenario Pain Points
A MySQL orders table with 700 million rows on an e‑commerce platform suffers from severe issues: simple queries take 12 seconds, full‑table counts take 278 seconds, B+‑tree depth reaches five levels causing massive I/O, the single table exceeds 200 GB leading to backup windows over six hours, and write concurrency reaches 8 000 QPS with master‑slave replication lag up to 15 minutes.
Core Contradictions
B+‑tree index depth of five layers, causing I/O explosion.
Single table >200 GB, backup time >6 hours.
Write concurrency of 8 000 QPS, master‑slave delay up to 15 minutes.
Key Insight: When a single table exceeds 50 million rows , it is time to initiate a sharding design plan.
Given a scenario with 1 billion orders, how should we shard the database?
1. Sharding Core Strategies
1.1 Vertical Splitting (Data Reduction)
Optimization Effects :
Core table size reduced by 60%.
High‑frequency query fields concentrated, improving cache hit rate.
1.2 Horizontal Splitting (Ultimate Solution)
Shard Key Selection Principles :
Discreteness : Avoid data hotspots (e.g., user_id is better than status ).
Business Relevance : 80% of queries must include the shard key.
Stability : Values should not change with business logic (avoid using phone numbers).
Sharding Strategy Comparison :
Range Sharding : Suitable for time‑range queries; simple expansion; example: monthly tables by create_time.
Hash Modulo : Uniform distribution; more complex; example: user_id % 128.
Consistent Hash : Supports dynamic scaling; example: Ketama algorithm.
Gene Sharding : Extracts a stable gene from a business key; example: derive shard from lower 12 bits of user_id.
2. Gene Sharding
High‑frequency queries in an order system:
User queries historical orders ( user_id).
Merchant queries orders ( merchant_id).
Customer service queries by order number ( order_no).
Solution : Gene‑based ID generator and routing engine.
// Gene‑based ID generator (64‑bit ID: sign(1)+timestamp(41)+gene(12)+sequence(10)
public class OrderIdGenerator {
private static final int GENE_BITS = 12;
public static long generateId(long userId) {
long timestamp = System.currentTimeMillis() - 1288834974657L;
long gene = userId & ((1L << GENE_BITS) - 1);
long sequence = ...; // obtain sequence
return (timestamp << 22) | (gene << 10) | sequence;
}
public static int getShardKey(long orderId) {
return (int) ((orderId >> 10) & 0xFFF); // extract middle 12 bits
}
} // Sharding router (8 databases, 16 tables each)
public class OrderShardingRouter {
private static final int DB_COUNT = 8;
private static final int TABLE_COUNT_PER_DB = 16;
public static String route(long orderId) {
int gene = OrderIdGenerator.getShardKey(orderId);
int dbIndex = gene % DB_COUNT;
int tableIndex = gene % TABLE_COUNT_PER_DB;
return "order_db_" + dbIndex + ".orders_" + tableIndex;
}
}3. Cross‑Shard Queries
3.1 Heterogeneous Index Table
Elasticsearch Index Mapping :
{
"order_index": {
"mappings": {
"properties": {
"order_no": {"type": "keyword"},
"shard_key": {"type": "integer"},
"create_time": {"type": "date"}
}
}
}
}Global Secondary Index (GSI) in ShardingSphere :
CREATE SHARDING GLOBAL INDEX idx_merchant ON orders(merchant_id)
BY SHARDING_ALGORITHM(merchant_hash)
WITH STORAGE_UNIT(ds_0, ds_1);4. Data Migration
Dual‑Write Migration Scheme :
Gray‑Switch Steps :
Enable dual write (rollback old DB on new DB failure).
Migrate all historical data using paginated batch processing.
Real‑time validation of incremental data with automatic repair.
Gradually shift traffic by user ID from 1% to 100%.
5. Pitfall Guide
5.1 Hotspot Issue
During a major sales event, orders from a popular store were all routed to the same shard.
Solution: introduce a composite shard key, e.g., (merchant_id + user_id) % 1024.
5.2 Distributed Transaction
Using RocketMQ for eventual consistency:
// Transactional order creation
@Transactional
public void createOrder(Order order) {
orderDao.insert(order); // write primary DB
rocketMQTemplate.sendAsync("order_create_event", order); // send message
}
@RocketMQMessageListener(topic = "order_create_event")
public void handleEvent(OrderEvent event) {
bonusService.addPoints(event.getUserId()); // async points
inventoryService.deduct(event.getSkuId()); // async inventory
}5.3 Pagination Trap
Cross‑shard pagination can cause page number disorder.
Solution: use Elasticsearch aggregation queries or limit queries to recent three months.
6. Ultimate Architecture & Performance Metrics
Performance Improvements :
User order query: 3200 ms → 68 ms.
Merchant order export: previously timed out → 8 s.
Full‑table statistics: unavailable → ~1.2 s (approximate).
Conclusion
Choosing the right shard key is more critical than effort; gene sharding fits order systems best.
Reserve capacity for at least two years of data growth during initial design.
Avoid over‑design; small‑table joins are far more efficient than distributed joins.
Monitor shard skewness; focus on databases with skew rate >15%.
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
