Backend Development 9 min read

How Elasticsearch Inverted Index Works: From Sharding to Real‑Time Search

This article explains Elasticsearch's inverted index structure, sharding and replica concepts, cluster node roles, and the detailed write, read, search, delete, and update processes, illustrating how data moves from memory to disk and becomes searchable in near real‑time.

IT Architects Alliance

Dec 25, 2022

How Elasticsearch Inverted Index Works: From Sharding to Real‑Time Search

Understanding Inverted Index

An inverted index consists of two main parts: the Term Dictionary, which records all terms and their association to posting lists, and the Posting List, which stores entries ( Posting) linking a term to documents. Each posting includes the document ID, term frequency TF, position Position for phrase queries, and offset Offset for highlighting.

Elasticsearch Inverted Index

In Elasticsearch, every field of a JSON document has its own inverted index. Fields can be excluded from indexing to save storage, but then they are not searchable.

Sharding and Cluster Deployment

Each index can be split into multiple shard s. A primary shard is the main copy, while replica shards hold copies for fault tolerance. A typical three‑node cluster (e.g., esnode1, esnode2, esnode3) might be configured with three primary shards and one replica each:

PUT /sku_index/_settings
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

When the master node (elected automatically, e.g., esnode2) fails, a new master is elected. If a data node fails, its primary shards are promoted from replicas.

Write Process

The client sends a request to a chosen node, which becomes the coordinating node .

The coordinating node uses the document ID to route the request to the appropriate primary shard (e.g., on Node1).

The primary shard writes the document, then forwards the operation to replica shards (e.g., Node3) and waits for acknowledgments before responding to the client.

Internally, the write involves three steps:

Data is first written to memory and recorded in the translog file.

A refresh (default every 1 s) makes the in‑memory data visible by creating a new segment in the filesystem cache.

A flush (default every 30 min or when the translog reaches 512 MB) writes segments to disk and clears the translog.

Read Process

The client contacts a node, which acts as the coordinating node.

The coordinating node routes the request to the appropriate shard (primary or replica) based on the routing table.

The shard returns the document to the coordinating node, which forwards it to the client.

If a replica is queried before it has received the latest write, the document may appear missing; querying the primary ensures the latest version.

Search Process

The coordinating node forwards the query to all relevant shards.

Each shard executes the query phase, returning matching document IDs.

The coordinating node performs the fetch phase, retrieving full documents from the shards and merging, sorting, and paginating the results.

For example, with three shards each returning a top‑10 list, the coordinating node merges 30 IDs, re‑sorts them, and returns the final top‑10.

Delete and Update Mechanics

Delete creates a .del file marking a document as deleted; searches consult this file to filter out removed docs.

Update is implemented as a delete followed by a new insert.

Underlying this is the Index Buffer which, on each refresh, generates a new segment file. Periodic merge operations combine segment files, apply deletions, and write a new commit point to finalize the state.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed architecture Elasticsearch Sharding inverted index Search Read Process Write Process

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.