How Elasticsearch Writes, Reads, and Searches Data: Inside the Engine
This article explains Elasticsearch's internal mechanisms for writing, reading, and searching data, covering the roles of coordinating nodes, primary and replica shards, buffers, translog, segment files, refresh cycles, commit and flush operations, as well as Lucene's inverted index and how deletions and updates are handled.
Elasticsearch Write Process
The client selects a node to send the request, which becomes the coordinating node.
The coordinating node routes the document to the appropriate node containing the primary shard.
The primary shard processes the request and synchronizes the data to the replica node.
After the primary and all replicas have handled the request, the coordinating node returns the response to the client.
Elasticsearch Read Process
The client sends a request to any node, which acts as the coordinate node.
The coordinate node hashes the doc id and routes the request to a node using a round‑robin algorithm, selecting a primary or replica shard for load balancing.
The receiving node returns the document to the coordinate node, which then returns it to the client.
Elasticsearch Search Process
The client sends a request to a coordinate node.
The coordinating node forwards the search request to all relevant primary or replica shards.
During the query phase, each shard returns matching doc id s to the coordinating node, which merges, sorts, and paginates the results.
In the fetch phase, the coordinating node retrieves the actual documents from the shards using the doc id s and returns them to the client.
Underlying Write Mechanics
Data is first written to an in‑memory buffer and the translog. When the buffer is near full or after a timeout, a refresh moves the data to a new segment file, first placing it in the OS cache.
By default, Elasticsearch refreshes every second, making it near‑real‑time (NRT); data becomes searchable after about one second. A manual refresh can force the buffer into the OS cache immediately.
Periodically, a commit (also called flush) writes a commit point to disk, fsyncs the OS cache, and clears the translog, ensuring durability.
Delete/Update Mechanics
Delete operations generate a .del file marking documents as deleted. Updates mark the old document as deleted and write a new version.
Regular refreshes create new segment files; merges combine multiple segments, physically removing deleted documents and writing a new segment with an updated commit point.
Lucene Layer
Lucene is a Java library that provides algorithms for building inverted indexes. Elasticsearch uses Lucene's APIs to create and manage these indexes on local disk.
Inverted Index
An inverted index maps terms to the list of document IDs containing them, enabling fast full‑text search. For example, the term "Facebook" maps to all documents that include that word, allowing the search engine to quickly retrieve relevant results.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
