Databases 9 min read

How to Efficiently Query Sharded Order Tables: Redundant, Index, and Gene Methods

When order_id is not the sharding key, querying a sharded order table can require scanning every shard, but three design patterns—data redundancy, an index table, and gene‑based sharding—enable fast single‑shard lookups while balancing storage and complexity.

ITPUB

Dec 8, 2022

How to Efficiently Query Sharded Order Tables: Redundant, Index, and Gene Methods

In consumer‑facing systems the sharding key is often user_id, which makes queries that filter only by order_id expensive because the database must scan all shards and merge results with UNION ALL.

Design 1: Redundant Data Method

Insert two copies of each order: one partitioned by user_id and another by order_id. Queries that use order_id can target the shard that stores the order_id copy directly, achieving the best performance at the cost of large data duplication.

Design 2: Index Table Method

Create a lightweight index table idx_orderid_userid that stores order_id and the corresponding user_id. The table’s primary key is order_id. When an order is inserted, a row is also added to this index table.

CREATE TABLE idx_orderid_userid (
  order_id bigint,
  user_id bigint,
  PRIMARY KEY (order_id)
);

To retrieve an order, first query the index table to obtain the user_id, then query the main order table using both user_id (the sharding key) and order_id. This splits the original SQL into two shard‑key‑based statements:

# step 1
SELECT user_id FROM idx_orderid_userid WHERE order_id = 801462890610556951;

# step 2
SELECT * FROM T_ORDER WHERE user_id = ? AND order_id = 801462890610556951;

Both steps hit only a single shard, so the total number of shard queries is constant regardless of the total shard count.

Design 3: Gene Sharding Method

Embed the sharding information directly into the primary key. Using user_id % 16 (the last 4 bits) as the shard gene, generate a 64‑bit order ID (e.g., via Snowflake), then replace its lowest 4 bits with the gene. The resulting order_id still uniquely identifies the order but also reveals the target shard.

Example calculation:

Compute user_id % 16 → shard gene (e.g., 1001).

Generate a 60‑bit distributed ID.

Append the 4‑bit gene to form the final 64‑bit order_id.

Test code demonstrates that order_id % 16 equals user_id % 16:

@Test
public void modIdTest() {
    long userID = 20160169L;
    int shardNum = 16;
    String gen = getGen(userID, shardNum);
    log.info("userID:{}的基因为:{}", userID, gen);

    long snowId = IdWorker.getId(Order.class);
    log.info("雪花算法生成的订单ID为{}", snowId);

    Long orderId = buildGenId(snowId, gen);
    log.info("基因转换后的订单ID为{}", orderId);

    Assert.assertEquals(orderId % shardNum, userID % shardNum);
}

The test output shows the original Snowflake ID 1595662702879973377 and the gene‑adjusted ID 1595662702879973385, confirming identical shard placement.

An alternative, simpler approach is to make the primary key a string that concatenates order_id and user_id (e.g., "1595662702879973377-20160169"). Because the shard key is embedded in the string, a query on this column can directly route to the correct shard with a single lookup.

Real‑world systems such as Taobao use a similar technique: the order number’s trailing digits encode the user‑based shard gene, allowing one‑shard queries.

Conclusion

After sharding, always include the sharding key in queries. When that is impossible, the three patterns above—data redundancy, an index table, or embedding the shard gene in the primary key—provide practical ways to achieve single‑shard lookups, trading off storage overhead against query efficiency.