Understanding Elasticsearch Architecture, Indexing, and Storage Mechanisms
Elasticsearch combines Lucene’s inverted index with a distributed cluster of master‑eligible, data, and coordinating nodes, using Zen discovery for node election and split‑brain prevention, while indexing writes to primary shards, replicating to replicas, storing immutable segments that are periodically merged for efficient search.
Elasticsearch (ES) is increasingly used by enterprises to store and search unstructured data such as e‑commerce product catalogs, analytics, and logs. It complements traditional relational databases by providing capabilities like full‑text search based on Lucene’s inverted index.
Inverted Index : An inverted index maps terms to the documents that contain them, similar to a dictionary where each term points to a list of document IDs and term frequencies. This structure enables fast retrieval of all documents containing a given term.
Cluster Architecture :
• An ES cluster consists of multiple nodes, each running an ES service instance. Nodes join a cluster by sharing the same cluster.name in the configuration.
• Node roles are defined in elasticsearch.yml:
node.master: true/false node.data: true/falseFour role combinations are possible: master‑eligible only, master‑eligible + data, data only, or neither.
• Master‑eligible nodes can participate in elections and become the master node, which manages index metadata, shard allocation, and cluster state.
• Data nodes store primary and replica shards and handle indexing, search, and aggregation operations.
• Any node can act as a coordinating node, receiving client requests, routing them to the appropriate shard, and aggregating results.
Discovery Mechanism (Zen Discovery) :
ES uses Zen Discovery for node discovery and master election. It supports unicast (default) and multicast (generally discouraged in production). Unicast hosts are configured as: discovery.zen.ping.unicast.hosts: ["host1", "host2:port"] During the discovery phase, nodes exchange heartbeat messages; the master election ensures only one master is active, using a quorum‑based approach.
Split‑Brain Prevention :
Split‑brain occurs when multiple masters are elected, leading to data inconsistency. Mitigation strategies include increasing ping timeout, setting discovery.zen.minimum_master_nodes appropriately, and separating master‑eligible and data roles.
Index Write Process :
Writes are directed to primary shards and then replicated to replica shards. Routing determines the target primary shard using the formula: shard = hash(routing) % number_of_primary_shards The default routing value is the document’s _id, but it can be customized.
After a write reaches the primary shard, the coordinating node forwards the request to the appropriate data node, which persists the document and replicates it to replicas. The operation succeeds only after all replicas acknowledge the write.
Storage Principles :
ES is built on Lucene. New documents are indexed in memory, written to a transaction log (transLog), and periodically flushed to disk as immutable segments. Refresh intervals (default 1 s) make segments searchable; commit points are created every 30 min or when a segment exceeds 512 MB.
Segments cannot be modified in place. Additions create new segments, deletions are recorded in a .del file, and updates are implemented as a delete + add sequence.
Segment Merging :
To avoid an explosion of small segments, Lucene merges them in the background, discarding deleted documents and creating larger, more efficient segments. Merging is resource‑controlled to limit impact on search performance.
Conclusion :
The article provides an overview of ES’s distributed architecture, indexing workflow, and storage mechanisms, offering insights that can be applied to the design of other distributed systems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
