Operations 13 min read

Design and Implementation of a Handcrafted Distributed Cluster (MyCluster)

This article describes how to design and build a native distributed cluster called MyCluster without using any existing frameworks, covering master‑slave architecture, leader election, split‑brain handling, centralized configuration management, custom communication protocols, state transitions, and client interfaces.

IT Architects Alliance

Apr 5, 2021

Design and Implementation of a Handcrafted Distributed Cluster (MyCluster)

The goal of this section is to manually design and develop a challenging distributed cluster named MyCluster without relying on any distributed frameworks or middleware, in order to strengthen core architectural skills.

Requirements include building MyCluster from scratch in any language, using a master‑slave structure where the Master node collects user tasks (MyTask) and dispatches them, handling node failures, task retries, and state recovery, and supporting Master node election when the current Master fails.

MyCluster nodes share a common configuration file, with some parameters (e.g., ports, thread pool size) varying per machine; the configuration must be centrally managed and versioned.

Leader election is achieved by assigning each node a unique name sorted alphabetically; the node with the smallest name becomes the Leader. If the Leader fails, the next smallest alive node takes over, avoiding complex algorithms.

In case of a split‑brain scenario, the cluster either pauses or lets the larger partition (more than half of the nodes) continue serving, simplifying decision making when the node count is fixed.

Configuration management is split into a static seed file (node IDs, IPs, ports) and a dynamic shared configuration file that the Master updates and distributes, with version numbers to support rollback and reduce transmission frequency.

The update process follows a two‑phase commit: (1) Master pushes the new configuration to all slaves, (2) slaves validate and acknowledge, (3) Master sends a commit command; slaves that cannot apply the update stop to avoid inconsistency.

Communication between nodes uses long‑lived TCP connections with binary messages. Each message includes a type identifier, sequence number, and protocol version to handle asynchronous responses and future upgrades.

Message types include HelloMessage (node ID, uptime, cluster state, current Master ID, config version), LeaderDeclare (Master ID, online node count, config version), FollowKing (node ID, whether it needs a new config), WaitInPlace (node ID, online node count) with corresponding replies, and ConfigMessage (config version, compression flag, content).

State transitions are defined as Leader‑Select, Working, and Standby. The Master monitors node joins/leaves, triggering re‑election or state changes as needed, and uses periodic HelloMessage heartbeats that can also report resource usage (memory, CPU, disk, bandwidth).

Management commands include node offline, topology report, and configuration update. A client can be a traditional CLI or a lightweight web server exposing REST interfaces for external integration.

The complete architecture of MyCluster is illustrated in the final diagram.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems Configuration Management fault tolerance leader election Cluster Architecture

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.