How to Build a High‑Availability Cluster with Corosync & Pacemaker
This article explains the concepts, architecture, resource management, installation steps, and a practical NFS high‑availability example for building a Corosync‑Pacemaker HA cluster on Linux, covering constraints, split‑brain handling, and monitoring.
Introduction
High‑availability (HA) clusters aim to minimize service interruption caused by server failures. A cluster is a group of computers that provide network resources as a single entity, with each computer acting as a node.
Purpose of HA Clusters
HA clusters reduce losses from hardware/software errors by automatically detecting failures and switching services to standby nodes within seconds, ensuring continuous availability. The main functions of HA software are fault detection and automated service failover.
Overall Architecture
With the rapid growth of internet services, companies require high availability to avoid costly downtime. Operations staff must reduce mean time between failures from both hardware and software perspectives.
Corosync is a cluster communication suite that defines message transport and protocols via simple configuration. Pacemaker, used as a plugin for Corosync, provides resource management. The combination is configured through command‑line tools such as crm because pcs is unavailable.
Common Corosync Configurations
Typical combinations include:
heartbeat v1 + hasource
heartbeat v2 + crm
heartbeat v3 + pacemaker + crmsh (corosync v2 adds voting to avoid split‑brain)
corosync v1 + pacemaker
corosync v2 + pacemaker
cman + rgmanager
corosync v1 + cman + pacemaker
Resource Management (CRM)
Resource types:
primitive : basic resource, runs on a single node.
group : collection of resources that constitute a service.
clone : multiple instances of the same resource running on several nodes.
multi‑state (master/slave) : special clone where instances have master‑slave relationships.
Resource agents (RA) categories:
LSB scripts located in /etc/rc.d/init.d/*, supporting start/stop/restart/reload/status/force‑reload (cannot be enabled for auto‑boot).
OCF (Open Cluster Framework) agents in /usr/lib/ocf/resource.d/provider/, supporting start/stop/status/monitor/meta‑data.
STONITH agents for fencing, and systemd units in /usr/lib/systemd/system/ (must be enabled for auto‑boot).
Constraint Types
Location constraints – define node preference for a resource.
Order constraints – define whether resources may run on the same node.
Sequence constraints – define start‑up ordering dependencies.
HA Working Models
A/P (active/passive) – two‑node primary‑backup.
A/A (active/active) – two‑node primary‑primary.
N‑M – N nodes provide M services; N‑M nodes act as standby.
Split‑Brain Isolation
When a cluster splits and a partition holds less than half of the votes, isolation mechanisms are used:
STONITH – node‑level fencing by power‑off or reboot.
Fencing – resource‑level isolation via network switches.
Installation and Configuration
Prerequisites: hostname resolution between nodes and synchronized time.
Install Pacemaker (CentOS 7): yum -y install pacemaker Corosync configuration sections include totem , logging , quorum , and nodelist . After editing, generate the authentication key with corosync‑keygen -l and copy the config and key to each node.
Start services:
systemctl start corosync
systemctl start pacemakerInstall crmsh tools for resource management:
yum -y install crmsh-2.1.4-1.1.x86_64.rpm pssh-2.3.1-4.2.x86_64.rpm python-pssh-2.3.1-4.2.x86_64.rpmHigh‑Availability Example: NFS with Corosync + Pacemaker
Deploy an NFS service on a separate server, mount the same directory on both cluster nodes, and configure resources so that when one node is set to standby, the NFS resource automatically moves to the other node.
Resource Monitoring
Define a monitor for the httpd service: if httpd stops, it is restarted; if restart fails, the resource is migrated to a healthy node.
Conclusion
Corosync + Pacemaker provides a slightly more complex HA solution than LVS, but it also supports health checks for resources and can integrate with tools like ldirectord to generate IPVS rules automatically.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
