Databases 18 min read

How to Scale an Order System: Sharding, ID Generation, and Database Choices

This article walks through the challenges of a growing order system, analyzes current bottlenecks, compares sharding and distributed database products, proposes a unique ID strategy, outlines sharding key selection, presents a migration plan with ShardingSphere‑JDBC, and discusses risks and FAQs for a robust backend architecture.

Architect
Architect
Architect
How to Scale an Order System: Sharding, ID Generation, and Database Choices

1. Requirement Background

As the business expands, the single‑database‑single‑table design can no longer handle the increasing data volume and traffic. The main pain points are massive data size, high concurrent load, and poor scalability when modifying the order table.

2. Current Business Situation

The order system faces several concrete problems:

Primary‑key design : orderId is auto‑increment, orderNo is custom with the last four digits representing a user‑like ID; different scenarios use different keys.

Excessive indexes : more than ten indexes, some for C‑end, some for B‑end.

Historical data inconsistency : legacy order numbers lack the user‑like suffix and may contain letters.

Inconsistent foreign‑key usage : extension tables link by either orderId or orderNo.

Table bloat : many non‑core fields (billing, invoicing, refunds) inflate the table.

3. Technical Selection

The article compares ordinary databases, distributed databases, cloud‑native databases, and cloud‑native distributed databases, then evaluates specific products (PolarDB‑Partitioned Table, OceanBase, PolarDB‑X, TiDB, and traditional sharding).

Key comparison points include development team, type, application scenario, advantages, disadvantages, sharding principle, suitable scenarios, and documentation links.

4. Unique ID Solutions

Four typical ID generation methods are examined:

Database or Redis auto‑increment : simple but single‑point and performance limited; suitable for small, low‑growth systems.

Snowflake algorithm : 1‑bit sign, 41‑bit timestamp, 10‑bit machine ID, 12‑bit sequence; high performance, ordered IDs, but requires a central node and has no business semantics.

UUID / random : easy, no central node, but long IDs and poor index performance.

Meituan Leaf (snowflake‑style and segment‑style) : cluster‑deployed, avoids clock rollback, adds batch generation and local cache; needs ZooKeeper and extra deployment.

A custom business ID format is also proposed:

orderType(1)+bizType(1)+timestamp(yyMMddHHmmss)(12)+random(4)+userIdSuffix(4)

, which is readable but may clash under >10k QPS.

5. Sharding Key and Strategy

Three typical sharding dimensions are discussed:

Range (time) : natural for logs, easy to archive, but may cause hotspot loads.

List (tenant ID) : good for SaaS or order‑type partitioning; maintenance can be complex and may lead to data skew.

Hash (custom business ID) : provides uniform distribution and load balancing; however, range queries are inefficient and scaling is harder.

6. Best Practice Implementation

The chosen solution is ShardingSphere‑JDBC for sharding and distributed transaction support. The migration steps include:

Create new order tables merging orderId and orderNo into a single string primary key.

Refactor related tables to use the new key.

Strip non‑core columns from the order table.

Isolate OLTP and OLAP data sources.

Introduce ShardingJDBC with configured sharding keys, data sources, and transaction proxy.

Rewrite joins to include the sharding key.

Clean historical data using Flink to replace old order numbers and update related tables.

Data volume estimation suggests 64 tables (2^6) to store ten years of orders (≈36 billion rows) while keeping each table under 50 million rows.

7. Distributed Transaction Integration

Seata is recommended for handling distributed transactions across multiple data sources, following Spring Cloud multi‑datasource best practices.

8. Implementation Steps

Provision new databases and create the main order table plus sharded tables.

Generate new order numbers offline and store the mapping in order_new_relation.

Use Flink to sync old data, replace IDs, and route records to the appropriate shards.

9. Risk Assessment & Mitigation

Improper sharding key : may cause query slowdown and data skew – mitigate by analyzing query patterns and choosing high‑frequency fields.

Data migration risk : possible data loss or downtime – mitigate with detailed migration plans, incremental sync, full verification, and rollback procedures.

Distributed transaction consistency : risk of inconsistency – mitigate by using mature transaction frameworks, reducing transaction scope, and adding compensation mechanisms.

Performance degradation : due to cross‑shard queries – mitigate with SQL optimization, caching, read/write splitting, and proper indexing.

Code refactor effort : may extend development time – mitigate by phased rollout, comprehensive test cases, and gray‑release strategies.

10. Frequently Asked Questions

Q1: How to handle pagination across shards?

Use streaming queries, last‑record ID as the cursor, or delegate complex reports to an OLAP store.

Q2: How to guarantee uniqueness after sharding?

Perform uniqueness checks at the business layer, employ a global ID generator, or use distributed locks.

Q3: How to scale data?

For hash sharding, double the number of tables (e.g., 64 → 128) using consistent hashing; use Flink CDC for real‑time sync.

Q4: How to handle distributed‑transaction timeouts?

Set reasonable timeouts, rely on Seata’s automatic compensation, and minimize the use of large or frequent distributed transactions.

11. Summary

The proposed plan balances future business growth, operational cost, and server expense. It contrasts partitioned tables, distributed databases, and sharding, discusses ID generation, sharding key, and algorithm choices, and outlines a concrete migration path using ShardingSphere‑JDBC. For zero‑downtime migration, refer to the Dazhong Dianping phased implementation guide.

Snowflake algorithm structure
Snowflake algorithm structure
Order data flow diagram
Order data flow diagram
shardingdistributed transactionsdatabase scalingUnique ID
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.