Backend Development 16 min read

Why Elasticsearch’s from/size Pagination Struggles with Deep Paging and How to Fix It

This article analyzes Elasticsearch’s three pagination methods—from/size, scroll, and search after—explaining their internal query and fetch phases, performance trade‑offs for deep paging, and provides Java implementation examples and practical recommendations.

IT Architects Alliance

Dec 4, 2022

Why Elasticsearch’s from/size Pagination Struggles with Deep Paging and How to Fix It

1. Introduction

Elasticsearch is a real‑time distributed search and analytics engine for large volumes of unstructured data. It provides fast retrieval but suffers from the same deep‑paging problem as relational databases when using the basic from + size pagination.

2. from + size pagination

The from + size method works like SQL LIMIT. Example DSL:

GET /wms_order_sku/_search
{
  "query": { "match_all": {} },
  "from": 10,
  "size": 20
}

2.1 Query stage

The coordinating node creates a priority queue of size from+size, broadcasts the request to relevant shards, each shard builds its own queue, and finally the coordinating node merges all shard queues into a global queue.

2.2 Fetch stage

Using the IDs stored in the global queue, the coordinating node retrieves the actual documents from the shards (only the size documents needed for the current page).

2.3 Performance impact

If from or size is large, every shard must return a huge number of _id and _score pairs, causing massive memory consumption and sorting overhead on the coordinating node.

2.4 Java implementation

private SearchHits getSearchHits(BoolQueryBuilder queryParam, int from, int size, String orderField) {
    SearchRequestBuilder searchRequestBuilder = this.prepareSearch();
    searchRequestBuilder.setQuery(queryParam)
                       .setFrom(from)
                       .setSize(size)
                       .setExplain(false);
    if (StringUtils.isNotBlank(orderField)) {
        searchRequestBuilder.addSort(orderField, SortOrder.DESC);
    }
    SearchResponse searchResponse = searchRequestBuilder.execute().actionGet();
    return searchResponse.getHits();
}

2.5 Summary

Elasticsearch limits the default result window to 10,000 documents ( index.max_result_window = 10000). When from+size exceeds this limit, the engine recommends using the scroll API instead of increasing the limit.

3. Scroll pagination

Scroll works like a database cursor: the first request creates a snapshot and returns a scroll_id. Subsequent requests use this ID to fetch the next batch, avoiding repeated sorting of large result sets.

3.1 Execution process

During the query stage, shards return IDs which are stored as a snapshot on the coordinating node. The fetch stage reads size documents from that snapshot and returns a new scroll_id for the next page.

3.2 Elasticsearch example

GET /wms_order_sku2021_10/_search?scroll=1m
{
  "query": {
    "bool": {
      "must": [
        { "range": { "shipmentOrderCreateTime": { "gte": "2021-10-04 00:00:00", "lt": "2021-10-15 00:00:00" } } }
      ]
    }
  },
  "size": 20
}

3.3 Java implementation

protected Page searchPageByConditionWithScrollId(BoolQueryBuilder queryParam, Class targetClass, Page page) throws Exception {
    SearchResponse scrollResp;
    String scrollId = ContextParameterHolder.get("scrollId");
    if (scrollId != null) {
        scrollResp = getTransportClient()
            .prepareSearchScroll(scrollId)
            .setScroll(new TimeValue(60000))
            .execute().actionGet();
    } else {
        scrollResp = this.prepareSearch()
            .setSearchType(SearchType.QUERY_AND_FETCH)
            .setScroll(new TimeValue(60000))
            .setQuery(queryParam)
            .setSize(page.getPageSize())
            .execute().actionGet();
        ContextParameterHolder.set("scrollId", scrollResp.getScrollId());
    }
    SearchHit[] hits = scrollResp.getHits().getHits();
    List list = new ArrayList(hits.length);
    for (SearchHit hit : hits) {
        Object instance = targetClass.newInstance();
        this.convertToBean(instance, hit);
        list.add(instance);
    }
    page.setTotalRow((int) scrollResp.getHits().getTotalHits());
    page.setResult(list);
    return page;
}

3.4 Summary

Scroll reduces query‑and‑sort overhead but only supports sequential navigation (next/previous). It cannot jump to arbitrary pages, and the snapshot does not reflect real‑time data changes.

4. Search After pagination

Introduced in Elasticsearch 5, Search After records the sort values of the last hit and uses them as a cursor for the next request, allowing real‑time data to be reflected while avoiding the snapshot limitation of scroll.

4.1 Execution process

Each shard returns one page of size hits; the coordinating node merges, sorts, and returns the top size hits. For the next page, the client sends the previous page’s last sort values via the search_after parameter.

4.2 Elasticsearch example

GET /wms_order_sku2021_10/_search
{
  "query": { "bool": { "must": [ { "range": { "shipmentOrderCreateTime": { "gte": "2021-10-12 00:00:00", "lt": "2021-10-15 00:00:00" } } } ] } },
  "size": 20,
  "sort": [ { "_id": { "order": "desc" } }, { "shipmentOrderCreateTime": { "order": "desc" } } ],
  "search_after": [ "SO-460_152-1447931043809128448-100017918838", 1634077436000 ]
}

4.3 Java implementation

public ScrollDto queryScrollDtoByParamWithSearchAfter(BoolQueryBuilder queryParam, Class targetClass, int pageSize, String afterId, List fieldSortBuilders) {
    SearchResponse scrollResp;
    SearchRequestBuilder builder = this.prepareSearch();
    if (CollectionUtils.isNotEmpty(fieldSortBuilders)) {
        fieldSortBuilders.forEach(builder::addSort);
    }
    builder.addSort("_id", SortOrder.DESC);
    if (StringUtils.isBlank(afterId)) {
        scrollResp = builder.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
                           .setQuery(queryParam)
                           .setSize(pageSize)
                           .execute().actionGet();
    } else {
        Object[] afterIds = JSON.parseObject(afterId, Object[].class);
        scrollResp = builder.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
                           .setQuery(queryParam)
                           .searchAfter(afterIds)
                           .setSize(pageSize)
                           .execute().actionGet();
    }
    SearchHit[] hits = scrollResp.getHits().getHits();
    List list = Arrays.stream(hits)
        .filter(Objects::nonNull)
        .map(SearchHit::getSourceAsMap)
        .filter(Objects::nonNull)
        .map(JSON::toJSONString)
        .map(e -> JSON.parseObject(e, targetClass))
        .collect(Collectors.toList());
    String newAfterId = JSON.toJSONString(hits[hits.length - 1].getSortValues());
    return ScrollDto.builder()
        .scrollId(newAfterId)
        .result(list)
        .totalRow((int) scrollResp.getHits().getTotalHits())
        .build();
}

4.4 Summary

Search After requires a globally unique sort field (e.g., _id plus a timestamp). It provides real‑time results, avoids maintaining a scroll snapshot, and is suitable for high‑concurrency queries, though it still cannot jump to arbitrary pages.

5. Overall comparison and recommendations

Small data sets (<10,000 docs) – use from/size for simplicity.

Large data sets, deep paging, batch jobs – use scroll.

Large data sets, real‑time high‑concurrency queries – use search after.

Avoid deep paging with from/size whenever possible. Adjust index.max_result_window only as a last resort, because values beyond 10,000 dramatically increase memory and CPU usage.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance Elasticsearch pagination scroll search_after from+size

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.