How Does Elasticsearch Write and Query Data? A Deep Dive into ES Internals
This article explains the complete workflow of Elasticsearch write, read, search, delete, and update operations, covering coordinating nodes, shard routing, buffer refresh, translog, segment files, commit/flush processes, and the underlying inverted index mechanism.
Elasticsearch interviewers often ask about the fundamental principles behind data write and query operations. Understanding the internal flow—from client request to final response—is essential for demonstrating competence.
Write Operation Flow
The client sends a request to any node, which becomes the coordinating node.
The coordinating node routes the document to the appropriate primary shard.
The primary shard processes the request and replicates the data to replica shards.
After all primary and replica shards acknowledge, the coordinating node returns the result to the client.
Read Operation Flow
The client issues a GET request using a document ID, which is hashed to determine the target primary shard.
The coordinating node forwards the request, using round‑robin selection among primary and replica shards for load balancing.
The receiving node returns the document to the coordinating node, which then sends it back to the client.
Search Execution Flow
The coordinating node forwards the query to all relevant shards (primary or replica).
Each shard performs a query phase, returning matching doc IDs to the coordinating node.
The coordinating node merges, sorts, and paginates results.
In the fetch phase, the coordinating node retrieves the full documents from the shards based on doc IDs and returns them to the client.
Underlying Index – Inverted Index
Search relies on an inverted index built during segment creation. When documents are indexed, terms are tokenized and stored in a structure that maps each term to the list of documents containing it, enabling fast full‑text search.
Write Path Details
Incoming documents are first placed in an in‑memory buffer; they are not searchable yet.
Simultaneously, the operation is recorded in the translog file.
When the buffer fills or a periodic timer triggers (default every 1 s), the buffer is refresh ed, creating a new segment file and moving data to the OS cache, making it searchable.
Every 1 s a new segment file is generated; if the buffer is empty, no refresh occurs.
Data resides in OS cache before being flushed to disk; this near‑real‑time (NRT) behavior means data becomes searchable about one second after write.
When the translog grows large or after 30 minutes, a commit (flush) writes all buffered data to disk, creates a commit point, and clears the translog.
During commit, data in OS cache is fsync ed to ensure durability.
Delete and Update Processes
Deletion creates a .del file marking the document as deleted; searches skip these documents.
Updates are implemented as a delete followed by a new write.
Segment merges periodically combine multiple segment files, physically removing documents marked as deleted and producing a new commit point.
Key Takeaways
Data flows from client → coordinating node → appropriate shard, with replication for durability.
Buffer → refresh (1 s) → OS cache → searchable; translog provides crash recovery.
Commit/flush persists data to disk and clears translog; merges clean up deleted docs.
Understanding these steps helps explain Elasticsearch’s near real‑time nature and potential data loss windows (up to ~5 s).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JavaEdge
First‑line development experience at multiple leading tech firms; now a software architect at a Shanghai state‑owned enterprise and founder of Programming Yanxuan. Nearly 300k followers online; expertise in distributed system design, AIGC application development, and quantitative finance investing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
