Databases 8 min read

Understanding Elasticsearch Inverted Index, Sharding, and Data Operations

This article explains the core concepts of Elasticsearch, including the structure and purpose of inverted indexes, how shards and replicas are organized in a cluster, and the detailed workflows for writing, reading, searching, and deleting documents within a distributed environment.

Architect

Dec 23, 2022

Understanding Elasticsearch Inverted Index, Sharding, and Data Operations

Elasticsearch’s cluster model is similar to Kafka and Redis, and the article introduces its fundamental concepts to broaden readers’ knowledge.

Inverted Index : Consists of a Term Dictionary that maps terms to posting lists, and Posting Lists that store document IDs, term frequencies, positions, and offsets. These structures enable efficient search and relevance scoring.

Elasticsearch Inverted Index : Each JSON field in a document has its own inverted index; fields can be excluded from indexing to save storage, though they become non-searchable.

Sharding : An index can be split into multiple shard s, each with a primary shard and replica shards. Data is distributed across nodes (e.g., esnode1, esnode2, esnode3).

PUT /sku_index/_settings
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

Response:
{
  "acknowledged": true
}

Cluster Roles : The master node manages metadata and shard allocation, while data nodes store the actual shards.

Write Process :

Client selects a node (coordinating node).

The coordinating node routes the request to the primary shard based on the document ID.

The primary shard writes the document, then forwards it to replica shards; once all replicas acknowledge, the coordinating node returns success.

Underlying Write Mechanics : Data is first written to memory and the translog, then periodically refreshed (default every 1 s) to create a segment file, and flushed (default every 30 min or 512 MB translog) to persist segments to disk.

Read Process :

Client contacts a coordinating node.

The coordinating node forwards the request to relevant shards.

Each shard returns matching document IDs, which the coordinating node merges, sorts, and paginates.

Search Process includes a query phase (shards return top‑N document IDs) and a fetch phase (coordinating node retrieves full documents).

Delete/Update : Deletions generate a .del file marking documents as deleted; updates are performed by marking the old document as deleted and writing a new version. Merge operations consolidate segment files and permanently remove deleted docs.

The article concludes with tips, diagrams, and references for further study.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

sharding Data Write search-process distributed-architecture inverted-index

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.