How to Paginate Efficiently Across Sharded Databases Without OOM
This article explains why pagination only becomes complex when querying across sharded databases, compares three common solutions—including global search, sequential ID‑based paging, and the discouraged secondary query method—provides practical SQL examples, outlines pros and cons, and offers a STAR‑style interview answer.
Identify Whether Pagination Is Cross‑Shard
Pagination becomes a problem only when a query spans multiple shards. If the WHERE clause always contains the sharding key (e.g., a specific user_id or a time bucket), the query is routed to a single shard and can use ordinary LIMIT/OFFSET logic.
First step: Determine whether the query can be satisfied by a single shard.
Typical Solutions for True Cross‑Shard Pagination
When the query really needs to aggregate results from several shards, three common approaches are used:
1. Global Search (Middleware/Database)
The middleware issues the same SELECT on every shard, collects the rows, sorts them globally, and then returns the requested page. This method is correct but its cost grows linearly with the page number because each shard must scan and sort all rows up to the requested offset.
Example (ShardingSphere‑style):
SELECT * FROM user_log
WHERE ... -- executed on every shard
ORDER BY create_time DESC
LIMIT 100000, 20;For deep pages (e.g., page 100 000) the memory consumption can reach OOM and response time becomes unacceptable.
2. Sequential‑ID Paging (Recommended for Deep Pagination)
Instead of using OFFSET, keep the maximum primary‑key value of the previous page and request the next page with a “greater‑than” predicate. This turns the query into a range scan, which is O(1) with respect to page depth.
SQL pattern for the first page:
SELECT * FROM user_log
ORDER BY id ASC
LIMIT 20;SQL pattern for subsequent pages (assuming the last row of the previous page had id = 12345):
SELECT * FROM user_log
WHERE id > 12345
ORDER BY id ASC
LIMIT 20;Advantages
Each request reads only one page; performance is independent of the page number.
Uses index range scans, avoiding large offsets and in‑memory sorting.
Disadvantages
Cannot jump directly to an arbitrary page; the client must request pages sequentially.
Requires a monotonically increasing identifier (or a composite key that preserves order).
3. Secondary Query Method (Not Recommended)
This technique performs an initial query to obtain the IDs of the desired page, then issues a second query to fetch the full rows by those IDs. It is complex, sensitive to data skew, and often yields inconsistent results, so it is generally discouraged.
Practical Guidance for Interviews
When asked about cross‑shard pagination, follow a concise structure:
Confirm whether the business can limit the query to a single shard.
If cross‑shard is unavoidable, mention middleware‑level global search but note its deep‑page limitation.
Recommend delegating the problem to a dedicated search engine (Elasticsearch) or a distributed SQL engine (TiDB) when possible.
If the solution must stay inside the relational database, propose sequential‑ID paging as the performant fallback.
Summary
Only queries that span shards need special pagination logic. The three main categories of solutions are:
Middleware/database global search (simple but costly for deep pages).
Open‑source frameworks that implement similar logic (convenient but share the same performance pitfall).
Business‑level compromises, such as avoiding deep pagination or using a dedicated search system.
For most production scenarios—feeds, message lists, audit logs—the best practice is to disallow jump paging and use sequential paging based on the previous page’s maximum ID .
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
