Master/Slave Architecture vs P2P Ring Structure and an Overview of Elasticsearch
This article explains the differences between Master‑Slave and P2P ring architectures, introduces Elasticsearch’s core concepts, internal components, master election, shard routing, indexing and search processes, and discusses how the system avoids split‑brain scenarios and ensures high availability.
The article begins by comparing the traditional Master‑Slave model, where a single master node manages slaves and can become a bottleneck, with a Peer‑to‑Peer (P2P) ring structure that has no central node, offering better scalability but lower controllability.
It then provides an overview of Elasticsearch (ES), describing it as a distributed, real‑time search and analytics engine built on Lucene, supporting full‑text, structured, and analytical queries.
Key ES concepts are listed, including clusters, nodes, indices, primary and replica shards, types, mappings, documents, fields, allocation, and gateways, explaining their analogues to relational database components.
The internal architecture is detailed: a gateway stores index files, a distributed Lucene layer handles indexing and searching, modules such as discovery, scripting, and plugins sit above, followed by transport and JMX layers, and finally RESTful and Java APIs for client interaction.
Service discovery and master election are handled by the ZenDiscovery module, which uses ping (multicast or unicast) to find nodes, selects a master based on lexical ordering of node IDs, and relies on the discovery.zen.minimum_master_nodes setting to prevent split‑brain situations.
Master election, split‑brain avoidance, and the role of quorum are explained, along with the impact of network partitions and the recommended quorum formula (N/2 + 1).
The article describes the default "Query Then Fetch" search flow, where the query is broadcast to all shard copies, a global ranking is built, and the top documents are fetched from the relevant shards.
Routing and replica consistency are covered: routing uses hash(routing) % number_of_primary_shards , and replicas provide fault tolerance. The write path involves a client request to a node, routing to the primary shard, indexing, and asynchronous replication to replica shards.
Shard allocation triggers include index creation, deletion, replica addition, and node changes. The relationship between shards and Lucene segments is explained, noting that segments are immutable, merged over time, and stored in various file types (.cfs, .fdt, .tim, .dvd, etc.).
Finally, the article lists reference links for further reading.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.