Big Data 5 min read

Elasticsearch Distributed Search Mechanisms: query_then_fetch and dfs_query_then_fetch

Elasticsearch provides two search types—query_then_fetch (default) and dfs_query_then_fetch—each involving a multi-step process where the client node distributes queries to relevant shards, shards execute searches using local or global term frequencies, aggregate results, and retrieve full documents, with noted trade‑offs.

System Architect Go

Nov 17, 2020

Elasticsearch Distributed Search Mechanisms

Elasticsearch has two search types, referred to as search_type:

• query_then_fetch (default) •

dfs_query_then_fetch

query_then_fetch

The execution steps are:

User initiates a search request, which reaches a node in the cluster.

The query is sent to all relevant shard partitions.

Each shard independently executes the query, performs sorting and pagination, and scores using the shard's own Local term/document frequencies.

Each shard returns only metadata (e.g., _id and _score) to the coordinating node.

The coordinating node merges the shard results, re‑sorts, paginates, and selects the final result documents (metadata only).

Based on the metadata, the coordinating node fetches the full document data from the appropriate shards on disk.

The detailed document data is assembled and returned to the user.

Drawback: Because each shard scores using its own term frequencies rather than global frequencies, uneven data distribution can cause scoring bias and affect relevance.

dfs_query_then_fetch

The dfs_query_then_fetch mechanism is similar to query_then_fetch but differs in two key ways:

User initiates a search request to a node in the cluster.

A preliminary phase gathers global term/document frequencies from all shards.

The query is then sent to all relevant shards.

Each shard executes the query independently, using Global term/document frequencies for scoring.

Shards return only metadata (e.g., _id and _score) to the coordinating node.

The coordinating node merges, sorts, paginates, and selects the final result documents (metadata only).

Based on the metadata, the coordinating node fetches the full document data from the appropriate shards.

The detailed documents are assembled and returned to the user.

Drawback: This method consumes more resources and is generally not recommended unless global scoring is essential.

Tips

Although Elasticsearch offers two search types, the default query_then_fetch is usually sufficient.

For modest data volumes (e.g., 20‑50 GB), configuring a single primary shard per index can simplify searching.

When the document source is not needed, setting _source: false avoids unnecessary network and disk I/O.

Deep pagination (e.g., using from + size to retrieve the last 10 records of the first 10 k) can be resource‑intensive because each shard processes up to the requested size before the coordinating node merges and sorts the results.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

ElasticSearch Distributed search dfs_query_then_fetch query_then_fetch

Written by

System Architect Go

Programming, architecture, application development, message queues, middleware, databases, containerization, big data, image processing, machine learning, AI, personal growth.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.