Databases 10 min read

How to Query Sharded Orders Efficiently: Redundant, Index, and Gene Strategies

When order_id is not the sharding key, querying a sharded order table can require scanning every shard, but three techniques—data redundancy, an index table, and embedding shard information into the order_id (gene method)—enable fast, single‑shard lookups.

ITPUB

Jan 29, 2023

How to Query Sharded Orders Efficiently: Redundant, Index, and Gene Strategies

In a previous article we concluded that for consumer‑facing (to‑C) services the sharding key should be a user attribute such as user_id. This follow‑up examines how to retrieve order details when the query uses order_id, which is not the sharding key.

If a query like SELECT * FROM T_ORDER WHERE order_id = 801462878019256325 is issued, ShardingSphere‑JDBC will generate a UNION ALL across all shards, potentially executing the statement on thousands of partitions and causing severe performance degradation.

Three Design Alternatives

Redundant Data Method

Index Table Method

Gene (Shard‑Key‑Embedding) Method

All three approaches trade extra storage for reduced query time, avoiding full‑shard scans.

Design 1: Redundant Data Method

The order record is inserted twice: once partitioned by user_id and once by order_id. Queries on order_id can target the specific shard directly, delivering the best performance, but at the cost of significant data duplication.

Design 2: Index Table Method

An auxiliary table idx_orderid_userid stores order_id and user_id:

CREATE TABLE idx_orderid_userid (
  order_id bigint,
  user_id bigint,
  PRIMARY KEY (order_id)
)

When inserting an order, a row is also added to this index table. To locate an order, first query the index table to obtain the corresponding user_id, then query the orders table using user_id as the sharding key. Example split:

# step 1
SELECT user_id FROM idx_orderid_userid WHERE order_id = 801462890610556951;

# step 2
SELECT * FROM T_ORDER WHERE user_id = ? AND order_id = 801462890610556951;

This approach ensures each sub‑query hits only one shard, reducing the number of accessed partitions to two regardless of total shard count.

Design 3: Gene (Shard‑Key‑Embedding) Method

The index‑table solution still incurs extra storage. A more elegant design embeds the sharding key directly into the primary key.

Assume sharding by user_id % 16; the last 4 bits of user_id (since log₂16 = 4) become the “shard gene”. When generating a distributed order_id (e.g., using Snowflake), the first 60 bits are random, and the last 4 bits are replaced with the shard gene, producing a 64‑bit identifier whose suffix reveals the shard.

Example Java test demonstrates the conversion and verifies that order_id % 16 == user_id % 16:

@Test
public void modIdTest() {
    long userID = 20160169L;
    int shardNum = 16;
    String gen = getGen(userID, shardNum);
    log.info("userID:{}的基因为:{}", userID, gen);

    long snowId = IdWorker.getId(Order.class);
    log.info("雪花算法生成的订单ID为{}", snowId);

    Long orderId = buildGenId(snowId, gen);
    log.info("基因转换后的订单ID为{}", orderId);

    Assert.assertEquals(orderId % shardNum, userID % shardNum);
}

Running the test yields an original order ID 1595662702879973377 and a converted ID 1595662702879973385, confirming identical modulo results.

An alternative, even simpler, is to store the primary key as a concatenated string order_id = order_id + "-" + user_id. Queries can then match the exact string, and the embedded user_id part instantly identifies the shard, requiring only a single‑shard lookup.

This method reduces storage overhead compared to redundant tables, incurs a modest increase in primary‑key size, and maintains high insert performance as long as the key remains ordered.

Real‑World Example

Taobao’s order numbers follow a similar pattern: the last six digits consistently map to the user’s shard, allowing direct shard resolution from the order ID.

Conclusion

After sharding, always include the sharding key in queries. When business constraints prevent this, the three techniques above—data redundancy, an index table, and embedding shard information into the primary key—provide viable work‑arounds, with the gene method offering the best balance of storage efficiency and query speed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Sharding gene method index table order query

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.