How Elasticsearch Powers Real-Time Search: Inverted Index, Sharding, and Write Mechanics
This article explains Elasticsearch’s core concepts—including inverted indexes, shard architecture, node roles, and the detailed write‑read‑search workflow—so readers can grasp how the system achieves near‑real‑time search and reliable data storage.
Understanding Inverted Index
Elasticsearch uses an inverted index similar to the structures employed by search engines and distributed systems.
Forward Index vs. Inverted Index
Inverted Index Components
Term Dictionary: records all terms and maps each term to its posting list (often implemented with B+ trees or hash chains for high‑performance insert and lookup).
Posting List: consists of entries that store
Document ID
Term Frequency (TF)
Position (for phrase queries)
Offset (for highlighting)
Elasticsearch’s Inverted Index
Each JSON field in a document has its own inverted index.
Indexing can be disabled for specific fields, saving storage but making the field unsearchable.
Distributed Architecture Principles
Sharding
Primary shard: each shard has one primary copy.
Replica shard: copies of the primary shard for redundancy.
Deploy an ES cluster on three machines (esnode1, esnode2, esnode3).
Create an Index with 3 Shards and 1 Replica
PUT /sku_index/_settings
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1
}
}
Response:
{
"acknowledged": true
}The cluster elects a master node (e.g., esnode2) to manage metadata and shard allocation.
Master node: handles metadata, shard promotion, and replica management.
Data node: stores actual shard data.
Node Failure Scenarios
If the master node fails, a new master is elected.
If a data node fails, its primary shards are promoted from replicas on other nodes.
Write Process
Steps to Write a Single Document
The client sends the request to a coordinating node.
The coordinating node routes the request to the primary shard based on the document ID.
The primary shard writes the document and forwards the operation to replica shards; once all replicas acknowledge, the coordinating node returns success.
When all replicas report success, the client receives a successful response, indicating the write is durable on both primary and replicas.
Tips: The client’s success response means the write has been completed on the primary shard and all its replicas.
Underlying Write Mechanics
Writing involves three main operations:
Write New Document : data is written to memory and appended to the translog file.
Refresh : every second, in‑memory segments are flushed to the filesystem cache, making the data searchable (near real‑time search).
Flush : every 30 minutes or when the translog reaches 512 MB, segments are written to disk and the translog is cleared. The translog records all operations between flushes, enabling recovery after failures.
Read Process
Steps to Read a Document
The client contacts a coordinating node.
The coordinating node routes the request to a shard (primary or replica) that holds the document.
The shard returns the document to the coordinating node, which forwards it to the client.
If a replica has not yet received the latest write, a read from that replica may report the document as missing, while a read from the primary succeeds.
Search Process
Search Data Flow
The client sends a query to a coordinating node.
The coordinating node forwards the query to all relevant shards (primary or replica).
Each shard returns its top matching document IDs (query phase).
The coordinating node merges, sorts, and paginates the results.
In the fetch phase, the coordinating node retrieves the full documents from the shards based on the IDs.
Example: with three shards, each returns its top 10 hits; the coordinating node merges the 30 results and returns the final top 10.
Delete/Update Mechanics
Delete: a .del file marks a document as deleted; searches consult this file to filter out deleted docs.
Update: the old document is marked deleted and a new document is written.
Underlying Logic
Each refresh creates a new segment file (default 1 second interval).
Merge operations combine multiple segment files, physically remove deleted docs, write a new segment, and record a commit point.
Source: juejin.cn/post/7110610301669605383
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
