Databases 9 min read

Mastering MySQL Sharding: Strategies for 1 Billion Orders

This article explores the pain points of a 700‑million‑row MySQL order table, presents vertical and horizontal sharding strategies, introduces gene‑based Snowflake IDs, details routing logic, migration steps, common pitfalls, and shows performance gains after applying the final architecture.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
Mastering MySQL Sharding: Strategies for 1 Billion Orders

Introduction

When a MySQL order table reaches 700 million rows, the platform experiences fatal issues such as 12‑second simple queries and 278‑second full‑table counts.

Core Contradictions

B+Tree index depth reaches five levels, causing a surge in disk I/O.

Single table size exceeds 200 GB, making backup windows longer than six hours.

Write concurrency hits 8,000 QPS, with master‑slave replication lag up to 15 minutes.

Key Insight: When a single table exceeds 50 million rows, a sharding plan should be initiated.

Given a scenario with one billion orders, how should we shard the database?

1. Sharding Core Strategies

1.1 Vertical Split – Reduce Data First

Optimization Effects:

Core table size reduced by 60 %.

High‑frequency query fields become cache‑friendly, improving hit rate.

1.2 Horizontal Split – The Ultimate Solution

Three Principles for Sharding Key Selection :

Discreteness : Avoid hot spots (e.g., user_id is better than status).

Business Relevance : 80 % of queries must include the key.

Stability : The value should not change with business logic (avoid using phone numbers).

Sharding Strategy Comparison :

Strategy Type   | Applicable Scenario      | Expansion Complexity
----------------------------------------------------------
Range Sharding  | Queries with time range  | Simple
Hash Modulo    | Uniform distribution      | Difficult
Consistent Hash| Dynamic scaling           | Medium
Gene Sharding   | Avoid cross‑shard queries| Complex

2. Gene Sharding

High‑frequency queries in the order system:

User queries historical orders ( user_id).

Merchant queries orders ( merchant_id).

Customer service queries by order number ( order_no).

Solution:

Snowflake Order ID Generation :

public class OrderIdGenerator {
    // 64‑bit ID: sign(1) + timestamp(41) + gene(12) + sequence(10)
    private static final int GENE_BITS = 12;
    public static long generateId(long userId) {
        long timestamp = System.currentTimeMillis() - 1288834974657L;
        // Extract lower 12 bits of userId as gene
        long gene = userId & ((1L << GENE_BITS) - 1);
        long sequence = ...; // obtain sequence
        return (timestamp << 22) | (gene << 10) | sequence;
    }
    public static int getShardKey(long orderId) {
        return (int) ((orderId >> 10) & 0xFFF); // extract middle 12 bits
    }
}

Routing Logic :

public class OrderShardingRouter {
    // 8 databases, each with 16 tables
    private static final int DB_COUNT = 8;
    private static final int TABLE_COUNT_PER_DB = 16;
    public static String route(long orderId) {
        int gene = OrderIdGenerator.getShardKey(orderId);
        int dbIndex = gene % DB_COUNT;
        int tableIndex = gene % TABLE_COUNT_PER_DB;
        return "order_db_" + dbIndex + ".orders_" + tableIndex;
    }
}
Key Breakthrough: Embedding the gene in the ID ensures that orders from the same user always land on the same shard and allows direct shard location from the order ID.

3. Cross‑Shard Queries

3.1 Heterogeneous Index Table Solution

Elasticsearch Index Structure :

{
  "order_index": {
    "mappings": {
      "properties": {
        "order_no": { "type": "keyword" },
        "shard_key": { "type": "integer" },
        "create_time": { "type": "date" }
      }
    }
  }
}

4.2 Global Secondary Index (GSI)

-- Create a global index in ShardingSphere
CREATE SHARDING GLOBAL INDEX idx_merchant ON orders(merchant_id)
BY SHARDING_ALGORITHM(merchant_hash)
WITH STORAGE_UNIT(ds_0, ds_1);

4. Data Migration

Dual‑Write Migration Plan :

Gray‑Scale Switch Steps :

Enable dual‑write (rollback to old DB if new DB write fails).

Migrate historical data in batches.

Real‑time incremental data verification and auto‑repair.

Gradually shift traffic by user ID from 1 % to 100 %.

5. Pitfall Guide

5.1 Hotspot Issues

During a major sales event, all orders of a popular store were routed to the same shard.

Solution: Use a composite sharding key, e.g., (merchant_id + user_id) % 1024.

5.2 Distributed Transactions

Final consistency is achieved with RocketMQ:

// Final consistency implementation
@Transactional
public void createOrder(Order order) {
    orderDao.insert(order); // write primary DB
    rocketMQTemplate.sendAsync("order_create_event", order); // send message
}

@RocketMQMessageListener(topic = "order_create_event")
public void handleEvent(OrderEvent event) {
    bonusService.addPoints(event.getUserId()); // async points
    inventoryService.deduct(event.getSkuId()); // async inventory
}

5.3 Pagination Traps

Cross‑shard pagination can cause page number inconsistencies.

Solution: Use Elasticsearch aggregation or limit queries to recent three months.

6. Ultimate Architecture

Performance Metrics :

Scenario            | Before Split | After Split
------------------------------------------------
User order query    | 3200 ms      | 68 ms
Merchant export     | timeout      | 8 s
Full‑table stats    | unavailable  | 1.2 s (approx.)

Conclusion

Sharding key selection matters more than effort: Gene sharding is the best partner for order systems.

Plan for growth: Design to support at least two years of data increase.

Avoid over‑design: Small‑table joins are far cheaper than distributed joins.

Monitor‑driven optimization: Focus on shards with skew > 15 %.

True architectural art lies in balancing division and integration.
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsPerformance Optimizationshardingmysqldatabase scaling
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.