How to Eliminate Unstable Pagination Anchors and Data Duplication in Backend Systems
This article explains why offset‑based pagination can cause duplicate or missing records in backend list queries, illustrates real‑world cases, analyzes the root causes, and presents three practical solutions—including cursor pagination, timestamp windows, and Elasticsearch search_after—along with their trade‑offs and implementation details.
1. Cases
Three typical scenarios illustrate the "unstable pagination anchor" problem:
Case 1: In a social feed sorted by newest first, a newly posted item appears on multiple pages, causing duplicate entries.
Case 2: When bulk‑issuing coupons, newly added coupons cause some users to receive duplicate coupons, leading to financial loss.
Case 3: A payment transaction list sorted by a non‑unique field shows out‑of‑order data and even data loss.
All three stem from using LIMIT offset, size (MySQL) or from + size (Elasticsearch), where inserted or deleted rows shift the offset and break pagination stability.
2. Analysis
Using Case 1 as an example, the first page is fetched with:
SELECT * FROM t ORDER BY create_time DESC LIMIT 0,10The result set returns posts P1‑P10 (P1 newest). While the user scrolls, a new post P0 is inserted. The second page query:
SELECT * FROM t ORDER BY create_time DESC LIMIT 10,10should return P11‑P20, but because the result set has shifted, it returns P10‑P19, causing P10 to appear on both pages. If many rows are inserted, the same data can appear across multiple pages or become unreachable.
3. Solutions
Solution 1 – Timestamp + Unique Key Cursor Pagination
Replace the offset with the last record’s create_time and id as a cursor.
Determine a unique sorting key (e.g., create_time + id).
When paging, pass the previous page’s last create_time and id instead of an offset.
Next‑page query uses
WHERE create_time < last_time OR (create_time = last_time AND id < last_id)and orders by create_time DESC, id DESC.
SQL example – first page (no cursor) :
SELECT id, title, create_time FROM posts
ORDER BY create_time DESC, id DESC
LIMIT 10;Second page (using cursor) (assume last row: create_time='2024-05-20 14:30:00', id=100):
SELECT id, title, create_time FROM posts
WHERE create_time <= '2024-05-20 14:30:00' AND id < 100
ORDER BY create_time DESC, id DESC
LIMIT 10;Advantages: eliminates duplicates/skips, good performance with a composite index, works with non‑unique ordering. Disadvantage: cannot jump directly to an arbitrary page; only next/previous or scroll loading.
Solution 2 – Fixed Timestamp Window
Record the query start time as max_create_time. All subsequent pages add WHERE create_time <= max_create_time, so newly inserted rows are excluded.
SQL example – first page:
-- Assume current time is 2024-05-20 15:00:00
SELECT id, title, create_time FROM posts
WHERE create_time <= '2024-05-20 15:00:00'
ORDER BY create_time DESC, id DESC
LIMIT 0,10;Second page (same window):
SELECT id, title, create_time FROM posts
WHERE create_time <= '2024-05-20 15:00:00'
ORDER BY create_time DESC, id DESC
LIMIT 10,10;Advantages: simple, low cost, no server‑side state needed. Disadvantages: cannot handle deletions within the window (may skip data), data may become stale, and deep pagination still suffers from large offsets.
Solution 3 – Elasticsearch‑Specific Optimizations
Both cursor pagination and timestamp window can be adapted to ES, but for deep pagination ES limits the result window to 10,000. Use search_after or scroll:
search_after: Provide the sort values of the last hit as a cursor; avoids offset and is the recommended approach.
scroll: Takes a snapshot of sorted document IDs; not recommended anymore.
Code example (search_after):
// First page
SearchRequest request = new SearchRequest("posts");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.matchAllQuery());
sourceBuilder.sort("create_time", SortOrder.DESC);
sourceBuilder.sort("id", SortOrder.DESC);
sourceBuilder.size(10);
request.source(sourceBuilder);
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// Capture last hit's sort values
SearchHit lastHit = response.getHits().getHits()[response.getHits().getHits().length - 1];
Object[] lastSortValues = lastHit.getSortValues();
// Second page using search_after
sourceBuilder.searchAfter(lastSortValues);
sourceBuilder.size(10);
request.source(sourceBuilder);
SearchResponse page2Response = client.search(request, RequestOptions.DEFAULT);Advantages: works with massive ES datasets, avoids performance hit of large offsets, and handles non‑unique ordering. Disadvantage: cannot jump to arbitrary pages.
4. Summary
Root cause: Pagination instability arises from offset‑based anchors that shift when data is inserted or deleted, and from non‑unique ordering fields.
Solution selection logic: Choose based on whether page jumps are required, data volume, and update frequency – e.g., B‑side admin panels needing jumps may use LIMIT + timestamp, C‑side infinite scroll with large data prefers cursor pagination, and ES bulk export prefers search_after.
Engineering value: Standardizing on a stable pagination strategy (unique sort key, cursor or fixed window) and embedding it into coding, testing, and monitoring guidelines turns individual fixes into team‑wide best practices, improving user experience and protecting revenue.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
