Databases 10 min read

How Elasticsearch Writes, Reads, and Searches Data: Inside the Engine

This article explains Elasticsearch's internal mechanisms for writing, reading, and searching data, covering the roles of coordinating nodes, primary and replica shards, buffers, translog, segment files, refresh cycles, commit and flush operations, as well as Lucene's inverted index and how deletions and updates are handled.

Programmer DD

Jan 28, 2021

How Elasticsearch Writes, Reads, and Searches Data: Inside the Engine

Elasticsearch Write Process

The client selects a node to send the request, which becomes the coordinating node.

The coordinating node routes the document to the appropriate node containing the primary shard.

The primary shard processes the request and synchronizes the data to the replica node.

After the primary and all replicas have handled the request, the coordinating node returns the response to the client.

Elasticsearch Read Process

The client sends a request to any node, which acts as the coordinate node.

The coordinate node hashes the doc id and routes the request to a node using a round‑robin algorithm, selecting a primary or replica shard for load balancing.

The receiving node returns the document to the coordinate node, which then returns it to the client.

Elasticsearch Search Process

The client sends a request to a coordinate node.

The coordinating node forwards the search request to all relevant primary or replica shards.

During the query phase, each shard returns matching doc id s to the coordinating node, which merges, sorts, and paginates the results.

In the fetch phase, the coordinating node retrieves the actual documents from the shards using the doc id s and returns them to the client.

Underlying Write Mechanics

Data is first written to an in‑memory buffer and the translog. When the buffer is near full or after a timeout, a refresh moves the data to a new segment file, first placing it in the OS cache.

By default, Elasticsearch refreshes every second, making it near‑real‑time (NRT); data becomes searchable after about one second. A manual refresh can force the buffer into the OS cache immediately.

Periodically, a commit (also called flush) writes a commit point to disk, fsyncs the OS cache, and clears the translog, ensuring durability.

Delete/Update Mechanics

Delete operations generate a .del file marking documents as deleted. Updates mark the old document as deleted and write a new version.

Regular refreshes create new segment files; merges combine multiple segments, physically removing deleted documents and writing a new segment with an updated commit point.

Lucene Layer

Lucene is a Java library that provides algorithms for building inverted indexes. Elasticsearch uses Lucene's APIs to create and manage these indexes on local disk.

Inverted Index

An inverted index maps terms to the list of document IDs containing them, enabling fast full‑text search. For example, the term "Facebook" maps to all documents that include that word, allowing the search engine to quickly retrieve relevant results.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch Lucene inverted index Segment translog near real-time

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.