Which Elasticsearch Pagination Method Is Best? from/size, search_after, Scroll API & PIT
This guide compares Elasticsearch’s four common pagination techniques—`from/size`, `search_after`, Scroll API, and Point in Time—detailing their syntax, advantages, drawbacks, and ideal use‑cases, helping developers select the most efficient method based on pagination depth, consistency requirements, and resource constraints.
1. Using from and size
The from parameter defines the offset of the first hit to return, while size specifies how many hits to retrieve. This is the default pagination method supported by the Elasticsearch Search API.
GET /index/_search
{
"from": 10,
"size": 10,
"query": { "match": { "field": "value" } }
}Pros
Simple to use : intuitive syntax for most basic pagination needs.
Broad support : built‑in to the Elasticsearch Search API.
Cons
Performance degradation : high from values force Elasticsearch to skip many documents, increasing query latency.
Resource consumption : large offsets consume additional memory and CPU, potentially impacting cluster stability.
Applicable scenarios
Shallow pagination : suitable for the first few pages (e.g., pages 1‑10).
Small datasets : when the total number of documents is modest and pagination requirements are simple.
2. Using search_after
search_afterenables deep pagination by using the sort values of the last hit from the previous page. The client must supply these values in the next request.
GET /index/_search
{
"size": 10,
"query": { "match": { "field": "value" } },
"sort": [
{ "timestamp": "asc" },
{ "_id": "asc" }
],
"search_after": ["2023-01-01T00:00:00", "some_id"]
}Pros
Efficient deep pagination : performance remains stable even for large page numbers because Elasticsearch does not need to skip documents.
Strong deduplication : when combined with a unique sort field (e.g., _id), duplicate results are avoided.
Cons
State management : the client must persist the previous page’s sort values, adding implementation complexity.
No random page jumps : only sequential navigation is possible; you cannot jump directly to an arbitrary page.
Applicable scenarios
Deep pagination : large result sets where consistent performance is required.
Continuous data streams : log retrieval, real‑time analytics, or any use case that processes data in order.
3. Using the Scroll API
The Scroll API is designed for bulk extraction of large datasets. It creates a point‑in‑time snapshot of the index at query time and returns a scroll_id that can be used to fetch subsequent batches.
POST /index/_search?scroll=1m
{
"size": 100,
"query": { "match_all": {} }
}
# Retrieve next batch
POST /_search/scroll
{
"scroll": "1m",
"scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAA..."
}Pros
Handles large data volumes : stable performance for exporting or batch‑processing massive numbers of documents.
Avoids page‑jump issues : the snapshot prevents index changes during retrieval from affecting the result set.
Cons
Resource consumption : maintaining the scroll context occupies cluster resources, especially under high concurrency.
Not for real‑time search : unsuitable for interactive pagination where low latency is required.
Applicable scenarios
Bulk data export : data migration, backup, or any one‑time extraction task.
Large‑scale analysis : processing a huge number of documents in a single operation.
4. Using Point in Time (PIT)
PIT creates a time‑based view that remains consistent across multiple pagination requests. It is typically combined with search_after for efficient deep pagination.
POST /index/_search?pit=true&size=10
{
"sort": [...],
"query": { ... }
}
# Subsequent request using the PIT ID
POST /index/_search
{
"pit": { "id": "some_pit_id", "keep_alive": "1m" },
"sort": [...],
"query": { ... },
"search_after": [ ... ]
}Pros
Consistent view : the same snapshot is used for all pages, guaranteeing data consistency even if the index changes.
Combines with search_after : improves efficiency for deep pagination while preserving consistency.
Cons
Increased complexity : developers must manage PIT lifecycles, including creation, keep‑alive settings, and explicit release.
Resource consumption : each PIT session consumes cluster resources until it expires or is cleared.
Applicable scenarios
Consistent pagination across users : multiple clients need to see the same data snapshot.
Deep pagination with consistency requirements : when both performance and a stable view are essential.
5. Choosing the appropriate pagination method
Based on pagination depth
Shallow pages (first few pages) : use from / size for simplicity.
Deep pages : prefer search_after or combine it with PIT for better performance.
Based on data‑consistency requirements
No strict consistency needed : from / size is sufficient.
Strict consistency required : use PIT to lock the view across requests.
Based on usage pattern
Interactive user pagination : typically from / size.
Batch processing or export : Scroll API.
Based on resource constraints
Limited resources / high concurrency : avoid Scroll API; use search_after or PIT instead.
Frequent deep pagination : search_after combined with PIT offers the best trade‑off between performance and resource usage.
6. Summary
Elasticsearch provides four main pagination strategies:
from/size : simple, suitable for shallow pagination; performance degrades with large offsets.
search_after : efficient deep pagination with deduplication; requires client‑side state and does not support random page jumps.
Scroll API : ideal for bulk extraction of large datasets; not intended for real‑time user interactions.
Point in Time (PIT) : offers a consistent snapshot across pages; best used together with search_after for deep, consistent pagination.
Select the method that aligns with your pagination depth, consistency needs, usage scenario, and available cluster resources to achieve optimal performance and user experience.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
