Which Elasticsearch Pagination Method Is Best? from/size, search_after, Scroll API & PIT

This guide compares Elasticsearch’s four common pagination techniques—`from/size`, `search_after`, Scroll API, and Point in Time—detailing their syntax, advantages, drawbacks, and ideal use‑cases, helping developers select the most efficient method based on pagination depth, consistency requirements, and resource constraints.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
Which Elasticsearch Pagination Method Is Best? from/size, search_after, Scroll API & PIT

1. Using from and size

The from parameter defines the offset of the first hit to return, while size specifies how many hits to retrieve. This is the default pagination method supported by the Elasticsearch Search API.

GET /index/_search
{
  "from": 10,
  "size": 10,
  "query": { "match": { "field": "value" } }
}

Pros

Simple to use : intuitive syntax for most basic pagination needs.

Broad support : built‑in to the Elasticsearch Search API.

Cons

Performance degradation : high from values force Elasticsearch to skip many documents, increasing query latency.

Resource consumption : large offsets consume additional memory and CPU, potentially impacting cluster stability.

Applicable scenarios

Shallow pagination : suitable for the first few pages (e.g., pages 1‑10).

Small datasets : when the total number of documents is modest and pagination requirements are simple.

2. Using search_after

search_after

enables deep pagination by using the sort values of the last hit from the previous page. The client must supply these values in the next request.

GET /index/_search
{
  "size": 10,
  "query": { "match": { "field": "value" } },
  "sort": [
    { "timestamp": "asc" },
    { "_id": "asc" }
  ],
  "search_after": ["2023-01-01T00:00:00", "some_id"]
}

Pros

Efficient deep pagination : performance remains stable even for large page numbers because Elasticsearch does not need to skip documents.

Strong deduplication : when combined with a unique sort field (e.g., _id), duplicate results are avoided.

Cons

State management : the client must persist the previous page’s sort values, adding implementation complexity.

No random page jumps : only sequential navigation is possible; you cannot jump directly to an arbitrary page.

Applicable scenarios

Deep pagination : large result sets where consistent performance is required.

Continuous data streams : log retrieval, real‑time analytics, or any use case that processes data in order.

3. Using the Scroll API

The Scroll API is designed for bulk extraction of large datasets. It creates a point‑in‑time snapshot of the index at query time and returns a scroll_id that can be used to fetch subsequent batches.

POST /index/_search?scroll=1m
{
  "size": 100,
  "query": { "match_all": {} }
}
# Retrieve next batch
POST /_search/scroll
{
  "scroll": "1m",
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAA..."
}

Pros

Handles large data volumes : stable performance for exporting or batch‑processing massive numbers of documents.

Avoids page‑jump issues : the snapshot prevents index changes during retrieval from affecting the result set.

Cons

Resource consumption : maintaining the scroll context occupies cluster resources, especially under high concurrency.

Not for real‑time search : unsuitable for interactive pagination where low latency is required.

Applicable scenarios

Bulk data export : data migration, backup, or any one‑time extraction task.

Large‑scale analysis : processing a huge number of documents in a single operation.

4. Using Point in Time (PIT)

PIT creates a time‑based view that remains consistent across multiple pagination requests. It is typically combined with search_after for efficient deep pagination.

POST /index/_search?pit=true&size=10
{
  "sort": [...],
  "query": { ... }
}
# Subsequent request using the PIT ID
POST /index/_search
{
  "pit": { "id": "some_pit_id", "keep_alive": "1m" },
  "sort": [...],
  "query": { ... },
  "search_after": [ ... ]
}

Pros

Consistent view : the same snapshot is used for all pages, guaranteeing data consistency even if the index changes.

Combines with search_after : improves efficiency for deep pagination while preserving consistency.

Cons

Increased complexity : developers must manage PIT lifecycles, including creation, keep‑alive settings, and explicit release.

Resource consumption : each PIT session consumes cluster resources until it expires or is cleared.

Applicable scenarios

Consistent pagination across users : multiple clients need to see the same data snapshot.

Deep pagination with consistency requirements : when both performance and a stable view are essential.

5. Choosing the appropriate pagination method

Based on pagination depth

Shallow pages (first few pages) : use from / size for simplicity.

Deep pages : prefer search_after or combine it with PIT for better performance.

Based on data‑consistency requirements

No strict consistency needed : from / size is sufficient.

Strict consistency required : use PIT to lock the view across requests.

Based on usage pattern

Interactive user pagination : typically from / size.

Batch processing or export : Scroll API.

Based on resource constraints

Limited resources / high concurrency : avoid Scroll API; use search_after or PIT instead.

Frequent deep pagination : search_after combined with PIT offers the best trade‑off between performance and resource usage.

6. Summary

Elasticsearch provides four main pagination strategies:

from/size : simple, suitable for shallow pagination; performance degrades with large offsets.

search_after : efficient deep pagination with deduplication; requires client‑side state and does not support random page jumps.

Scroll API : ideal for bulk extraction of large datasets; not intended for real‑time user interactions.

Point in Time (PIT) : offers a consistent snapshot across pages; best used together with search_after for deep, consistent pagination.

Select the method that aligns with your pagination depth, consistency needs, usage scenario, and available cluster resources to achieve optimal performance and user experience.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

backendElasticsearchPaginationsearch afterscroll APIpoint in time
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.