Databases 9 min read

How to Scale Billion‑Row MySQL Order Tables with Sharding and Gene‑Based Partitioning

This article explains how to handle MySQL order tables that have grown to billions of rows by applying vertical and horizontal sharding, gene‑based partitioning, cross‑shard query strategies, data migration techniques, and practical pitfalls to achieve massive performance improvements.

IT Services Circle

Jul 7, 2025

How to Scale Billion‑Row MySQL Order Tables with Sharding and Gene‑Based Partitioning

Preface

Scenario pain point: An e‑commerce platform’s MySQL order table reached 700 million rows, causing fatal issues such as a simple query taking 12 seconds and a full‑table count taking 278 seconds.

Core contradictions:

B+Tree index depth reached five levels, causing massive disk I/O.

Single table size exceeded 200 GB, making backup windows longer than six hours.

Write concurrency hit 8 000 QPS, with master‑slave replication lag up to 15 minutes.

Key insight: When a single table exceeds 50 million rows, a sharding plan should be initiated.

Given a billion‑row order dataset, how should we shard it?

1 Sharding Core Strategies

1.1 Vertical Splitting: Reduce Data First

Optimization effects:

Core table size reduced by 60 %.

High‑frequency query fields concentrated, improving cache hit rate.

1.2 Horizontal Splitting: Ultimate Solution

Shard key selection three principles:

Discreteness: Avoid hotspots (e.g., user_id is better than status).

Business relevance: 80 % of queries must include the field.

Stability: Values should not change with business logic (avoid using phone numbers).

Shard strategy comparison:

Strategy Type   Applicable Scenario   Expansion Complexity   Example
Range Sharding  Time‑range queries   Simple                create_time per month
Hash Modulo     Uniform distribution  Difficult              user_id % 128
Consistent Hash Dynamic scaling       Medium                Ketama algorithm
Gene Sharding   Avoid cross‑shard queries Complex          Extract gene from user_id

2 Gene Sharding

High‑frequency queries in the order system:

User historical orders (user_id)

Merchant order lookup (merchant_id)

Customer service order lookup (order_no)

Solution:

Snowflake Order ID Generator:

public class OrderIdGenerator {
    // 64‑bit ID: sign(1) + timestamp(41) + gene(12) + sequence(10)
    private static final int GENE_BITS = 12;
    public static long generateId(long userId) {
        long timestamp = System.currentTimeMillis() - 1288834974657L;
        // Extract lower 12 bits of userId as gene
        long gene = userId & ((1L << GENE_BITS) - 1);
        long sequence = ...; // obtain sequence number
        return (timestamp << 22) | (gene << 10) | sequence;
    }
    // Extract shard key from order ID
    public static int getShardKey(long orderId) {
        return (int) ((orderId >> 10) & 0xFFF); // middle 12 bits
    }
}

Routing logic:

public class OrderShardingRouter {
    // 8 databases, each with 16 tables
    private static final int DB_COUNT = 8;
    private static final int TABLE_COUNT_PER_DB = 16;
    public static String route(long orderId) {
        int gene = OrderIdGenerator.getShardKey(orderId);
        int dbIndex = gene % DB_COUNT;
        int tableIndex = gene % TABLE_COUNT_PER_DB;
        return "order_db_" + dbIndex + ".orders_" + tableIndex;
    }
}

3 Cross‑Shard Queries

3.1 Heterogeneous Index Table Solution

Elasticsearch index mapping:

{
  "order_index": {
    "mappings": {
      "properties": {
        "order_no": { "type": "keyword" },
        "shard_key": { "type": "integer" },
        "create_time": { "type": "date" }
      }
    }
  }
}

4.2 Global Secondary Index (GSI)

-- Create global index in ShardingSphere
CREATE SHARDING GLOBAL INDEX idx_merchant ON orders(merchant_id)
BY SHARDING_ALGORITHM(merchant_hash)
WITH STORAGE_UNIT(ds_0, ds_1);

4 Data Migration

Dual‑write migration plan:

Gray‑switch steps:

Enable dual‑write (rollback to old DB if new DB write fails).

Full migration of historical data using paginated batch processing.

Real‑time incremental data verification with automatic repair.

Gradual traffic shift by user ID from 1 % to 100 %.

5 Pitfall Guide

5.1 Hotspot Issue

During Double‑Eleven, a popular shop’s orders were all routed to the same shard.

Solution: introduce a composite shard key (merchant_id + user_id) % 1024.

5.2 Distributed Transaction

Using RocketMQ for eventual consistency:

// Final consistency solution
@Transactional
public void createOrder(Order order) {
    orderDao.insert(order); // write master DB
    rocketMQTemplate.sendAsync("order_create_event", order); // send message
}

@RocketMQMessageListener(topic = "order_create_event")
public void handleEvent(OrderEvent event) {
    bonusService.addPoints(event.getUserId()); // async add points
    inventoryService.deduct(event.getSkuId()); // async deduct inventory
}

5.3 Pagination Trap

Cross‑shard query page numbers become inconsistent.

Solution: use Elasticsearch aggregation or limit queries to recent three months of orders.

6 Ultimate Architecture Solution

Performance metrics:

Scenario            Before Sharding   After Sharding
User order query    3200 ms            68 ms
Merchant export    timeout failure   8 s completed
Full‑table stats    unavailable       1.2 s (approx.)

Conclusion

Shard key selection matters more than effort: Gene sharding is the best partner for order systems.

Plan for growth: Design to support at least two years of data increase.

Avoid over‑design: Small‑table joins are far cheaper than distributed joins.

Monitor‑driven optimization: Focus on shards with skew rate > 15 %.

True architectural art lies in balancing division and integration.

sharding MySQL database partitioning gene sharding

Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.