Databases 11 min read

Mastering Sharding Pagination: 3 Proven Techniques to Avoid Performance Pitfalls

This article explains why using LIMIT on sharded tables can crash databases, then introduces three practical pagination tricks—anchor pagination, shard‑mark method, and reverse pagination—detailing their workflows, performance benefits, and common pitfalls for backend developers.

Java Backend Technology
Java Backend Technology
Java Backend Technology
Mastering Sharding Pagination: 3 Proven Techniques to Avoid Performance Pitfalls

Hello, I'm Xiao Fu. An online issue occurred: when the operations team exported orders from the past three months and clicked to page 100, the database crashed because the code used LIMIT 9900, 100, which in a sharding scenario caused each of the eight shards to fetch 10,000 rows and sort them in memory, exhausting memory.

Sharding LIMIT is a performance killer

Single‑table pagination using LIMIT offset, size works, but after sharding data across multiple shards, a query like LIMIT 10000, 10 (page 1001) leads to two problems:

Full shard scan : each shard must execute LIMIT 0, 10010 because it doesn't know data distribution, fetching the first 10,010 rows to avoid missing data.

Memory‑exploding sort : assuming eight shards, each returns ~10,010 rows, totaling over 80,000 rows, which are all pulled into the application layer for sorting, causing memory and CPU spikes; an offset of 100,000 once took 12 seconds and triggered circuit breaking.

The following solutions avoid full‑shard scans and in‑memory sorting, but each has specific applicability. The core principle: Don’t chase performance blindly; first ensure the business scenario matches the technique.

Anchor pagination

Anchor pagination offers the best performance but only works for “load more” scenarios.

The idea is to use an ordered column (e.g., auto‑increment ID or timestamp) as an anchor instead of offset.

Paginate by ID range

Assume an order table is split into three shards by ID range:

Shard 1: ID 1‑10000 (ordered and less than Shard 2 IDs)

Shard 2: ID 10001‑20000

Shard 3: ID 20001‑30000

To fetch page 2 (10 rows per page):

First page: ORDER BY id DESC LIMIT 10 → last ID = 100 (anchor).

Second page: WHERE id < 100 ORDER BY id DESC LIMIT 10.

Third page: use the last ID of page 2 (e.g., 90) as the next anchor.

Why it’s fast

Each shard executes WHERE id < xxx LIMIT 10 and returns only ten rows, so even page 1001 retrieves just ten rows per shard, dramatically reducing network and memory overhead.

Two pitfalls

Avoid hash sharding : if IDs are distributed by mod 3, each shard’s IDs are not ordered, forcing a full scan.

Cannot jump pages : anchor pagination only supports sequential “load more”; jumping from page 1 to page 100 requires redesign, e.g., using a “load more” UI as in TikTok.

Shard‑mark method

When jump‑page is required (e.g., admin export), maintain metadata in Redis that records the data range and total rows for each database/table.

Example metadata for a two‑database, twelve‑table‑per‑database layout is shown in the table below.

DB‑Table

Start ID

End ID

Total rows

DB 1‑Table 202401

1

5000

5000

DB 1‑Table 202402

5001

12000

7000

DB 2‑Table 202401

12001

18000

6000

To query LIMIT 15000, 10 (page 1501):

Locate the shard: cumulative rows up to DB 1‑Table 202402 = 12,000 < 15,000; adding DB 2‑Table 202401 reaches 18,000 > 15,000, so the target is DB 2‑Table 202401.

Compute intra‑table offset: 15,000 − 12,000 = 3,000, then run LIMIT 3000, 10 on that table.

Return the result directly because IDs are globally ordered.

Two pitfalls

Metadata must be near‑real‑time but not strongly synchronized : update Redis on inserts/deletes using periodic full counts plus incremental logs; avoid distributed locks under high concurrency.

Cannot sort by non‑shard key : if sorting by payment time while sharding by user ID, the metadata cannot locate rows, forcing a full‑shard scan.

Reverse pagination

Reverse pagination is useful only for fetching the last page.

Instead of LIMIT 9990, 10, first query the first ten rows ( ORDER BY id ASC LIMIT 10) to obtain the smallest ID, use its maximum as a reverse anchor, then query WHERE id > 10 ORDER BY id DESC LIMIT 10 to get the final ten rows.

Why it works

LIMIT 0, 10

is about 100× faster than LIMIT 9990, 10 because each shard only scans ten rows, aggregates, and then fetches the top ten IDs.

Two pitfalls

Only applicable to the very last page; cannot fetch the 10th‑last page without additional calculations.

Large deletions break the assumption of contiguous IDs; if many trailing IDs are removed, the reverse anchor must be recomputed.

Final thoughts

The essence of sharding pagination is balancing business needs and technical constraints. Prefer “load more” with anchor pagination when possible; use the shard‑mark method for jump‑page scenarios accepting eventual consistency; resort to middleware such as ShardingSphere for global sorting only when unavoidable.

backendSQLdatabaseshardingPagination
Java Backend Technology
Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.