Mastering High‑Availability Clusters: Resources, Constraints, and Failure Handling
This article explains the principles and components of high‑availability (HA) clusters, covering active/standby nodes, resource stickiness and constraints, heartbeat and quorum mechanisms, split‑brain avoidance, failure detection methods, and the minimal setup required for a reliable web‑service HA deployment.
What Is a High‑Availability (HA) Cluster?
HA clusters provide continuous service by configuring front‑end directors, back‑end RS‑servers, database servers, and shared storage with primary and backup nodes. When a primary server fails, a backup instantly takes over its resources, minimizing service interruption.
Node Roles
The active/primary node runs the workload, while the passive/standby node stands by as a failover target. Critical components such as the Director usually have a backup; RS‑servers and storage servers (e.g., MySQL) are often set up with master‑slave pairs.
Reliability and Availability
HA clusters focus on reliability and stability. Availability is calculated as service uptime / (service uptime + downtime). Industry standards progress from 99% to 99.999% (the “five‑9s” level) for mission‑critical systems such as financial transaction platforms.
HA Resources and Their Migration
When a node fails, its resources—virtual IPs (VIP), services, isolation devices, and filesystems—must be transferred to a backup node. Each RS runs one or more resources, and the cluster decides where to move them based on resource stickiness (preference for a specific node) and resource constraints .
Stickiness : a numeric score; higher scores mean the resource prefers that node.
Colocation constraint ("colocation"): determines whether two resources may run on the same node (Score > 0) or must not (Score < 0, -inf).
Location constraint ("location"): assigns a score to a node for a given resource; the node with the highest score wins.
Order constraint : defines start/stop order for dependent resources (e.g., VIP must start before IPVS rules).
Resource Types
Primitive – runs on a single node (primary resource).
Clone – runs on every node.
Group – a set of resources that move together.
Master/Slave – runs on two nodes, one as master, the other as slave.
Detecting Failures: Heartbeat and Quorum
Backup nodes use a heartbeat to verify that peers are alive. In clusters with three or more nodes, a quorum voting mechanism decides node legitimacy: each node gets one vote (or weighted votes), and a node is considered valid only if it obtains more than half of the total votes.
Handling Illegal Nodes
Freeze : continue processing existing requests but reject new ones, then migrate resources.
Stop : immediately stop services and migrate resources (most common).
Ignore : keep services running; used only when exactly two nodes back each other up.
Example: Resources Required for a MySQL Service
VIP (virtual IP) for client access.
Floating IP (FIP) that can move between nodes.
MySQL service process.
Mounted filesystem for data storage.
Split‑Brain (Brain‑Split) Scenario and Prevention
If two nodes write to the same file simultaneously after a false‑positive failover, data corruption occurs. To avoid this, isolate the failed node before migrating resources:
Node isolation using a STONITH device (e.g., power‑off the faulty node).
Resource‑level isolation via FC‑SAN to block storage access.
Additional Failure Detection Methods
Arbitration disk: the primary continuously writes to a shared disk; if the standby can write but the primary cannot, the primary is deemed dead.
Ping the gateway: loss of gateway connectivity indicates node failure.
Watchdog: a local process writes to a watchdog device; interruption triggers a reboot or removal from the cluster.
Messaging Layer and Cluster Resource Manager (CRM)
The Messaging Layer (UDP/694) transports heartbeats, stickiness, and constraints between nodes. The CRM decides where each resource should run and orchestrates actions via its sub‑components:
PE (Policy Engine) – evaluates policies.
TE (Transaction Engine) – issues commands to nodes.
LRM (Local Resource Manager) – executes start/stop actions on a node.
RA (Resource Agent) – scripts that control individual resources (LSB, OCF, legacy heartbeat).
Software Implementations
Messaging Layer options include Heartbeat (v1‑v3), Pacemaker, Corosync, Cman, and Keepalived (which uses VRRP for VIP management). CRM implementations combine these layers, e.g., pacemaker + corosync or crm + heartbeat v2.
Minimum HA Setup for a Web Service
At least two nodes are required, each running the Messaging Layer and CRM. Four essential resources are typically defined: VIP, HTTPD service, filesystem, and a STONITH device.
Configuration Tips
Node names must match the output of uname -n and be resolvable via /etc/hosts (avoid DNS reliance).
Synchronize system clocks across nodes.
Establish SSH trust so that any node can remotely manage others.
The first node boots itself and then starts services on the remaining nodes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
