Databases 17 min read

How to Scale an Order System with Sharding, Distributed IDs, and Seata Transactions

This article details a comprehensive redesign of a high‑traffic order system, covering the challenges of massive data volume, concurrency pressure, and poor scalability, and presenting a step‑by‑step solution that includes sharding strategy selection, database product comparison, unique ID generation, middleware choice, migration plan, risk mitigation, and FAQ handling.

Architect's Guide
Architect's Guide
Architect's Guide
How to Scale an Order System with Sharding, Distributed IDs, and Seata Transactions

Requirement Background

The growing business has caused the order system’s single‑database, single‑table architecture to become insufficient, leading to three main problems: massive data volume that slows queries, high concurrent request pressure that overloads the DB, and poor extensibility that makes schema changes time‑consuming.

Current Business Situation & Problem Analysis

Key issues identified include:

Primary‑key design : orderId uses auto‑increment while orderNo is custom; both are used inconsistently across internal and external scenarios.

Excessive indexes : over ten indexes, many dedicated to C‑end or B‑end use cases.

Historical data inconsistency : legacy order numbers lack the required suffix and may contain letters.

Inconsistent foreign‑key references : some extension tables link by orderId, others by orderNo.

Table redundancy : non‑core fields such as billing, invoicing, and refunds inflate the order table.

Technical Selection

A comparison of database types and products was performed.

Database Types

Ordinary DB : traditional vertical architecture, single server.

Distributed DB : horizontal sharding for high availability and scalability.

Cloud‑native DB : containerized, micro‑service‑ready, rapid deployment.

Cloud‑native Distributed DB : combines distributed and cloud‑native features.

Product Comparison

Products evaluated were PolarDB‑Partitioned Table, OceanBase, PolarDB‑X, TiDB, and a traditional sharding approach. Their strengths and weaknesses were summarized, along with shard‑principle details and suitable scenarios (e.g., HTAP workloads for PolarDB‑X, strong consistency for OceanBase, etc.).

Middleware Comparison

Three middleware options were compared:

ShardingSphere‑JDBC (client‑side sharding): lightweight, easy integration, supports multiple data sources, provides distributed transactions and read/write separation.

ShardingSphere‑Proxy (database proxy): richer features, supports heterogeneous languages, but incurs higher performance overhead and requires separate deployment.

Mycat (open‑source proxy): free, feature‑complete, but community support is limited and performance overhead is similar to the proxy.

Unique ID Schemes

Five ID generation approaches were examined:

DB or Redis auto‑increment : simple but introduces single‑point risk and performance bottlenecks.

Snowflake algorithm : 1‑bit sign, 41‑bit timestamp, 10‑bit machine ID, 12‑bit sequence; high performance, ordered IDs, but requires a central node and lacks business semantics.

UUID / random : no central node, but long IDs and poor index performance.

Meituan Leaf (Leaf‑snowflake & Leaf‑segment) : cluster‑deployed, avoids clock‑rollback issues, but depends on ZooKeeper.

Custom business ID : concatenates order type, business type, timestamp, random digits, and user suffix; readable but may conflict under >10k QPS.

Sharding Key & Strategy Selection

Three common strategies were evaluated:

Range (time‑based) : natural for log‑type tables, easy to archive, but can cause data skew.

List (tenant ID) : suitable when data carries a clear business identifier, but maintenance of shard rules can be complex.

Hash (custom business ID) : provides uniform distribution and load balancing, yet range queries become inefficient and scaling may be difficult.

Best Practice & Implementation Plan

After weighing migration cost, maintainability, and operational overhead, the team chose ShardingSphere‑JDBC for sharding.

Key Refactoring Points

Database schema : create a new order table merging orderId and orderNo into a string‑type custom identifier; adjust related extension tables to use the new key.

Field stripping : remove non‑core columns from the order table.

Code Refactoring

Separate OLTP and OLAP data sources.

Introduce ShardingJDBC with configured shard key, data sources, and distributed‑transaction proxy.

Rewrite join queries to include the shard key.

Data Cleaning

Clean legacy order numbers to conform to the new rule.

Update related tables (orderNo and orderId references) to the new identifiers.

Data Partitioning Estimate

Peak daily orders are 200k; planning for 5× growth over ten years yields ~3.6 billion orders. Assuming a 50 million row limit per table, 64 tables satisfy the 2ⁿ hash requirement.

Distributed Transaction Integration

Seata is adopted following Spring Cloud multi‑datasource best practices.

Implementation Steps

Create new databases and partitioned tables.

Generate new order numbers offline and store mapping in order_new_relation.

Use Flink to sync old data, replace old IDs with new ones, and propagate changes to extension tables.

Risk Assessment & Mitigation

Wrong shard key : analyze query patterns, choose high‑frequency fields, consider dual‑write or heterogeneous indexes.

Data migration loss : detailed migration plan, incremental sync + full verification, rollback preparation.

Distributed‑transaction inconsistency : use mature frameworks, reduce transaction scope, add compensation and monitoring.

Query performance degradation : SQL optimization, avoid cross‑shard queries, use caching, add read/write separation and indexes.

Heavy code refactor : phased rollout, comprehensive test cases, gray‑release strategy.

FAQ

Cross‑shard pagination

Use streaming queries to avoid large offsets, or use the last record ID as the next page cursor; for complex reports, query an OLAP store.

Ensuring uniqueness after sharding

Validate uniqueness at the business layer, employ a global ID generator, or use distributed locks.

Data scaling

For hash sharding, double the number of tables (e.g., 64 → 128) and apply consistent hashing to minimize data movement; use Flink‑CDC for real‑time sync.

Distributed‑transaction timeout handling

Set reasonable timeouts, rely on Seata’s automatic compensation, reduce transaction size, and avoid large transactions.

Conclusion

The proposed solution balances future business growth, operational cost, and server expense. It contrasts partitioned tables, distributed databases, and sharding, discusses ID generation, shard‑key, and algorithm choices, and outlines a concrete migration plan. For zero‑downtime migration, refer to the Dazhong Dianping phased implementation guide.

References

TIDB technical deep‑dive

Dazhong Dianping order‑system sharding case study

Meituan Leaf distributed ID system

Spring Cloud multi‑datasource Seata & ShardingJDBC best practice

Snowflake diagram
Snowflake diagram
backend architectureshardingdatabase scalingDistributed IDSeata
Architect's Guide
Written by

Architect's Guide

Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.