Mastering Sharding Pagination: 3 Proven Techniques to Avoid Performance Pitfalls
This article explains why using LIMIT on sharded tables can crash databases, then introduces three practical pagination tricks—anchor pagination, shard‑mark method, and reverse pagination—detailing their workflows, performance benefits, and common pitfalls for backend developers.
Hello, I'm Xiao Fu. An online issue occurred: when the operations team exported orders from the past three months and clicked to page 100, the database crashed because the code used LIMIT 9900, 100, which in a sharding scenario caused each of the eight shards to fetch 10,000 rows and sort them in memory, exhausting memory.
Sharding LIMIT is a performance killer
Single‑table pagination using LIMIT offset, size works, but after sharding data across multiple shards, a query like LIMIT 10000, 10 (page 1001) leads to two problems:
Full shard scan : each shard must execute LIMIT 0, 10010 because it doesn't know data distribution, fetching the first 10,010 rows to avoid missing data.
Memory‑exploding sort : assuming eight shards, each returns ~10,010 rows, totaling over 80,000 rows, which are all pulled into the application layer for sorting, causing memory and CPU spikes; an offset of 100,000 once took 12 seconds and triggered circuit breaking.
The following solutions avoid full‑shard scans and in‑memory sorting, but each has specific applicability. The core principle: Don’t chase performance blindly; first ensure the business scenario matches the technique.
Anchor pagination
Anchor pagination offers the best performance but only works for “load more” scenarios.
The idea is to use an ordered column (e.g., auto‑increment ID or timestamp) as an anchor instead of offset.
Paginate by ID range
Assume an order table is split into three shards by ID range:
Shard 1: ID 1‑10000 (ordered and less than Shard 2 IDs)
Shard 2: ID 10001‑20000
Shard 3: ID 20001‑30000
To fetch page 2 (10 rows per page):
First page: ORDER BY id DESC LIMIT 10 → last ID = 100 (anchor).
Second page: WHERE id < 100 ORDER BY id DESC LIMIT 10.
Third page: use the last ID of page 2 (e.g., 90) as the next anchor.
Why it’s fast
Each shard executes WHERE id < xxx LIMIT 10 and returns only ten rows, so even page 1001 retrieves just ten rows per shard, dramatically reducing network and memory overhead.
Two pitfalls
Avoid hash sharding : if IDs are distributed by mod 3, each shard’s IDs are not ordered, forcing a full scan.
Cannot jump pages : anchor pagination only supports sequential “load more”; jumping from page 1 to page 100 requires redesign, e.g., using a “load more” UI as in TikTok.
Shard‑mark method
When jump‑page is required (e.g., admin export), maintain metadata in Redis that records the data range and total rows for each database/table.
Example metadata for a two‑database, twelve‑table‑per‑database layout is shown in the table below.
DB‑Table
Start ID
End ID
Total rows
DB 1‑Table 202401
1
5000
5000
DB 1‑Table 202402
5001
12000
7000
DB 2‑Table 202401
12001
18000
6000
To query LIMIT 15000, 10 (page 1501):
Locate the shard: cumulative rows up to DB 1‑Table 202402 = 12,000 < 15,000; adding DB 2‑Table 202401 reaches 18,000 > 15,000, so the target is DB 2‑Table 202401.
Compute intra‑table offset: 15,000 − 12,000 = 3,000, then run LIMIT 3000, 10 on that table.
Return the result directly because IDs are globally ordered.
Two pitfalls
Metadata must be near‑real‑time but not strongly synchronized : update Redis on inserts/deletes using periodic full counts plus incremental logs; avoid distributed locks under high concurrency.
Cannot sort by non‑shard key : if sorting by payment time while sharding by user ID, the metadata cannot locate rows, forcing a full‑shard scan.
Reverse pagination
Reverse pagination is useful only for fetching the last page.
Instead of LIMIT 9990, 10, first query the first ten rows ( ORDER BY id ASC LIMIT 10) to obtain the smallest ID, use its maximum as a reverse anchor, then query WHERE id > 10 ORDER BY id DESC LIMIT 10 to get the final ten rows.
Why it works
LIMIT 0, 10is about 100× faster than LIMIT 9990, 10 because each shard only scans ten rows, aggregates, and then fetches the top ten IDs.
Two pitfalls
Only applicable to the very last page; cannot fetch the 10th‑last page without additional calculations.
Large deletions break the assumption of contiguous IDs; if many trailing IDs are removed, the reverse anchor must be recomputed.
Final thoughts
The essence of sharding pagination is balancing business needs and technical constraints. Prefer “load more” with anchor pagination when possible; use the shard‑mark method for jump‑page scenarios accepting eventual consistency; resort to middleware such as ShardingSphere for global sorting only when unavoidable.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
