How to Eliminate Duplicate and Missing Data in Backend Pagination
This article explains why offset‑based pagination can cause duplicate or missing records when data changes, compares three practical solutions—including cursor pagination with timestamps and IDs, fixed‑window timestamp filtering, and Elasticsearch search_after—and provides concrete SQL and code examples along with their trade‑offs and suitable scenarios.
In backend development, pagination is a fundamental feature, but using offset‑based queries can lead to unstable anchors, causing duplicate or missing data when new rows are inserted or old rows are deleted.
Typical Problems (Cases)
Case 1: In a social feed sorted by newest first, newly posted items cause the same post to appear on multiple pages.
Case 2: When issuing coupons in bulk, newly added coupons make some users receive duplicate coupons, resulting in financial loss.
Case 3: A payment transaction list sorted by a non‑unique field shows out‑of‑order data and occasional missing rows.
Analysis of the Root Cause
Offset pagination relies on the row count of the current result set. Inserting or deleting rows changes that count, so the offset for the next page becomes invalid, leading to repeated or skipped records.
Solution 1 – Cursor Pagination with Timestamp + Unique Key
Replace the offset with the values of the last row from the previous page (e.g., create_time and id), and query the next page using a > or < condition.
Define a unique sorting key that combines a time field and a unique identifier.
When requesting a page, send the last row's create_time and id instead of an offset.
Use a WHERE clause to filter rows greater (or smaller) than the anchor.
SQL example for MySQL:
SELECT id, title, create_time
FROM posts
ORDER BY create_time DESC, id DESC
LIMIT 10;Assume the last row of page 1 has create_time='2024-05-20 14:30:00' and id=100. The next page query becomes:
SELECT id, title, create_time
FROM posts
WHERE create_time <= '2024-05-20 14:30:00'
AND id < 100
ORDER BY create_time DESC, id DESC
LIMIT 10;Advantages: eliminates duplicates/skips, good performance with a composite index, works even when the sort field is not unique.
Disadvantage: cannot jump directly to an arbitrary page; only next/previous or infinite scroll is supported.
Applicable scenarios: all C‑end infinite‑scroll lists, large data sets (tens of thousands), and batch jobs that need stable full‑table traversal.
Solution 2 – Fixed‑Window Timestamp Filtering
Record the current time as max_create_time on the first request and reuse it for subsequent pages, so newly inserted rows are excluded.
On the first page, capture the current timestamp.
All following page queries add create_time <= max_create_time to the WHERE clause.
When the user refreshes, a new timestamp is captured.
SQL example:
-- First page (assume current time is '2024-05-20 15:00:00')
SELECT id, title, create_time
FROM posts
WHERE create_time <= '2024-05-20 15:00:00'
ORDER BY create_time DESC, id DESC
LIMIT 0,10; -- Second page (same time window)
SELECT id, title, create_time
FROM posts
WHERE create_time <= '2024-05-20 15:00:00'
ORDER BY create_time DESC, id DESC
LIMIT 10,10;Advantages: simple to implement, low cost, quickly prevents duplicate new rows.
Disadvantages: cannot handle deletions that cause gaps, data may appear stale until a refresh, and deep pagination still suffers from performance issues.
Applicable scenarios: C‑end scroll lists where data is mostly append‑only and deep pagination is rare.
Solution 3 – Elasticsearch‑Specific Optimizations
Both cursor pagination and fixed‑window filtering can be applied to Elasticsearch, but deep pagination faces the Result window is too large limit. Elasticsearch offers search_after and scroll APIs.
search_after : requires sorting, uses the sort values of the last hit as a cursor; recommended for most cases.
scroll : creates a snapshot of the result set in memory; not recommended for new development.
Java code example using search_after:
// First page
SearchRequest request = new SearchRequest("posts");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.matchAllQuery());
sourceBuilder.sort("create_time", SortOrder.DESC);
sourceBuilder.sort("id", SortOrder.DESC);
sourceBuilder.size(10);
request.source(sourceBuilder);
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
SearchHit lastHit = response.getHits().getHits()[response.getHits().getHits().length - 1];
Object[] lastSortValues = lastHit.getSortValues();
// Second page
sourceBuilder.searchAfter(lastSortValues);
sourceBuilder.size(10);
request.source(sourceBuilder);
SearchResponse page2Response = client.search(request, RequestOptions.DEFAULT);Advantages: avoids the performance penalty of large from values, works with massive ES datasets, and handles non‑unique sorting.
Disadvantage: still does not support arbitrary page jumps.
Applicable scenarios: ES bulk export (e.g., logs, batch Excel export) and any large‑scale read‑only pagination.
Conclusion
The core issue is unstable pagination anchors caused by offset usage and non‑unique sorting; the fix is to use a data‑based cursor and a unique sort key.
Choose a solution based on whether page jumps are required, data volume, and update frequency: B‑end systems needing jumps may use offset + timestamp, C‑end infinite scroll prefers cursor pagination, and ES bulk jobs benefit from search_after.
Standardizing these techniques into engineering guidelines (requirement → implementation → testing → monitoring) turns individual experience into team‑wide best practices, ensuring stable pagination, better user experience, and reduced financial risk.
Zhuanzhuan Tech
A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
