Operations 15 min read

Mastering High‑Availability Clusters: Concepts, Architecture, and Heartbeat Setup

This guide explains the fundamental concepts and layered architecture of high‑availability clusters, details the software components for each layer, compares RHEL/CentOS solutions, and provides step‑by‑step instructions for configuring heartbeat2 with haresource, including node preparation, authentication, resource definitions, and service failover.

MaGe Linux Operations

Jan 14, 2016

Mastering High‑Availability Clusters: Concepts, Architecture, and Heartbeat Setup

1. Basic Concepts of High‑Availability Clusters

What is a high‑availability cluster:

High‑availability cluster is an architecture that automatically transfers services to other hosts when a failure occurs, keeping the service running.

> High‑availability cluster architecture layers:

Backend host layer: services running on physical hosts.

Message layer: transmits heartbeat information.

Cluster Resources Manager (CRM): manages heartbeat transmission and collection.

Local Resources Manager (LRM): makes resource‑level decisions based on collected heartbeats (e.g., service migration).

Resource Agent (RA): scripts that start/stop resources following the {start|stop|restart|status} format.

Software implementations for each layer

Message layer:

- heartbeat v1
- heartbeat v2
- heartbeat v3
- (OpenAIS) corosync
- cman

CRM layer:

- heartbeat v1: uses text‑based haresources
- heartbeat v2: uses crmd service, configured via crmsh or heartbeat‑gui
- heartbeat v3: heartbeat + pacemaker + cluster‑glue (CLI: crm/pcs, GUI: hawk, LCMC, pacemaker‑mgmt)
- cman: uses rgmanager, provides failover domain feature; can also use RHCS suite
- corosync: uses rgmanager or pacemaker

LRM layer: usually implemented as a component of CRM.

RA layer:

- heartbeat legacy: traditional type
- LSB: /etc/rc.d/init.d/* scripts
- OCF (Open Cluster Framework): provided by vendors, works with pacemaker, linbit
- STONITH: uses hardware to force a failed node to power off

Keepalive combination (lightweight VRRP‑based solution):

- keepalive + ipvs
- keepalive + haproxy

RHEL or CentOS HA cluster solutions

RHEL(CentOS) 5:

- RHCS (cman+rgmanager) (built‑in)
- Third‑party options: corosync+pacemaker, heartbeat (v1 or v2), keepalived

RHEL(CentOS) 6:

- RHCS (cman+rgmanager)
- corosync+rgmanager
- cman+pacemaker
- heartbeat v3 + pacemaker
- keepalived

Application scenarios

- High‑availability front‑end load balancer: keepalived
- Large‑scale HA cluster: corosync (cman) + pacemaker

Resource isolation: solving resource contention

Scenario 1: Split brain where two sub‑clusters cannot communicate and both try to claim the same backend storage, risking catastrophic filesystem collapse. A voting mechanism ensures only the sub‑cluster with >50% votes survives.
Scenario 2: In an even‑node cluster, a split can give equal votes to both sides; an extra ping node is required for quorum.
Scenario 3: When quorum is insufficient, STONITH isolates resources by forcibly powering off the failed node via hardware or network switches.
<img src="http://mmbiz.qpic.cn/mmbiz/IP70Vic417DO0k61G40AMP5XpXibZr5Oib9b4BtkthvibELCujY2Ia3yzXI4IRjBhE3sB2lQU8lHU1IqlAImkEs6HA/0?wx_fmt=png" alt="brain split" />

HA cluster working models

A/P (Active/Passive): two‑node primary‑secondary model, requires a ping node.

N‑M: N nodes provide M services (N>M); active nodes = N, standby = N‑M.

N‑N: N nodes, N services; each node runs one service, a failed node's services are taken over by others.

A/A (Active/Active): both nodes are active, can run different services or the same service (e.g., IPVS with DNS round‑robin).

Resource migration constraints

rgmanager:

failover domain: limits which hosts a resource may move to
priority: defines host preference within a domain

pacemaker: uses resource constraints and stickiness to limit migration.

Resource stickiness: positive value keeps a resource on its current node.
Constraint types:
- Location: prefers a node (inf, n, -n, -inf)
- Colocation: forces resources to run together (inf) or apart (-inf)
- Order: defines start/stop sequence.
Example: make VIP, httpd, and filesystem run on the same node using colocation (inf), a resource group, and order constraints (VIP → filesystem → httpd).
Symmetry vs. asymmetry: symmetric clusters allow all nodes to host resources; asymmetric clusters restrict some nodes.

When a node leaves the cluster, how to handle its resources

stoped: directly stop the service
ignore: keep the service running as is
freeze: maintain existing connections, stop accepting new requests
suicide: kill the service

Should a newly configured resource start automatically?

target‑role: defines whether the resource should be started (started) or not (stopped).

Resource agent (RA) types

heartbeat legacy: traditional type
LSB: scripts under /etc/rc.d/init.d/
OCF: Open Cluster Framework agents
STONITH: dedicated agents for resource isolation

Resource types

primitive, native: single‑instance resources
group: a set of resources managed together
clone: resources run on all nodes (e.g., STONITH, cluster filesystem, distributed lock)
  - max‑clone: maximum number of clones
  - max‑per‑node: maximum per node
master/slave: primary‑secondary pair; master can read/write, slave is read‑only

2. Configuring a High‑Availability Cluster with heartbeat2 and haresource

Preparation before configuration

Node names must resolve correctly; ensure /etc/hosts entries match the output of uname -n.

Synchronize time via NTP.

Set up SSH key‑based authentication between nodes.

# ssh-keygen -t rsa
# ssh-copy-id [email protected]

Configure /etc/hosts on both hosts:

192.168.253.133 node1.playground.com
192.168.253.134 node2.playground.com

Install heartbeat2 (heartbeat‑pils is replaced by cluster‑glue on CentOS 6.5+)

Install dependencies:

# yum install perl-TimeDate net-snmp-libs libnet PyXML

Install the three heartbeat packages:

# rpm -ivh heartbeat-2.1.4-12.el6.x86_64.rpm \
    heartbeat-pils-2.1.4-12.el6.x86_64.rpm \
    heartbeat-stonith-2.1.4-12.el6.x86_64.rpm

Make two httpd services highly available

Copy example configuration files:

# cp /usr/share/doc/heartbeat-2.1.4/ha.cf /etc/ha.d/ha.cf

# vim /etc/ha.d/ha.cf
logfile /var/log/ha-log
keepalive 2
deadtime 15
warntime 10
udpport 694
mcast eth0 225.0.130.1 694 1 0
auto_failback on
ping 192.168.253.2
node node1.playground.com
node node2.playground.com

# scp /etc/ha.d/ha.cf node2.playground.com:/etc/ha.d/ha.cf

Configure resources

# cp /usr/share/doc/heartbeat-2.1.4/haresources /etc/ha.d/
# vim /etc/ha.d/haresources
node1.playground.com 192.168.253.100/24/eth0 httpd
# scp /etc/ha.d/haresources node2.playground.com:/etc/ha.d/haresources

Configure authentication file

# cp /usr/share/doc/heartbeat-2.1.4/authkey /etc/ha.d/authkey
# chmod 600 /etc/ha.d/authkey
# openssl rand -hex 10
8499636794b07630af98
# vim /etc/ha.d/authkey
auth 2
# 1 crc 2 sha1 8499636794b07630af98
# 3 md5 Hello!
# scp /etc/ha.d/authkey node2.playground.com:/etc/ha.d/authkey

Install httpd service (ensure it does not start automatically; let heartbeat control it)

Start cluster services

# service heartbeat start
# service node2.playground.com heartbeat start

Share a backend NFS filesystem between the two nodes (add a third host 192.168.253.135)

# vim /etc/exports
/var/www/share 192.168.253.0/24(rw)
# service nfs start
# echo 'web from share' > /var/www/share/index.html
# chown apache /var/www/share/index.html

Update haresources to include the NFS share

# vim /etc/haresources
node1.playground.com 192.168.253.100/24/eth0 \
Filesystem::192.168.253.135:/var/www/share::/var/www/html::nfs httpd
# scp /etc/haresources node2.playground.com:/etc/haresources

Restart the services and the HA cluster is ready.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Linux heartbeat high-availability RHEL

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.