Operations 9 min read

How to Build a High‑Availability Cluster with Corosync & Pacemaker

This article explains the concepts, architecture, resource management, installation steps, and a practical NFS high‑availability example for building a Corosync‑Pacemaker HA cluster on Linux, covering constraints, split‑brain handling, and monitoring.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How to Build a High‑Availability Cluster with Corosync & Pacemaker

Introduction

High‑availability (HA) clusters aim to minimize service interruption caused by server failures. A cluster is a group of computers that provide network resources as a single entity, with each computer acting as a node.

Purpose of HA Clusters

HA clusters reduce losses from hardware/software errors by automatically detecting failures and switching services to standby nodes within seconds, ensuring continuous availability. The main functions of HA software are fault detection and automated service failover.

Overall Architecture

With the rapid growth of internet services, companies require high availability to avoid costly downtime. Operations staff must reduce mean time between failures from both hardware and software perspectives.

Corosync is a cluster communication suite that defines message transport and protocols via simple configuration. Pacemaker, used as a plugin for Corosync, provides resource management. The combination is configured through command‑line tools such as crm because pcs is unavailable.

HA cluster framework diagram
HA cluster framework diagram

Common Corosync Configurations

Typical combinations include:

heartbeat v1 + hasource

heartbeat v2 + crm

heartbeat v3 + pacemaker + crmsh (corosync v2 adds voting to avoid split‑brain)

corosync v1 + pacemaker

corosync v2 + pacemaker

cman + rgmanager

corosync v1 + cman + pacemaker

Detailed architecture diagram
Detailed architecture diagram

Resource Management (CRM)

Resource types:

primitive : basic resource, runs on a single node.

group : collection of resources that constitute a service.

clone : multiple instances of the same resource running on several nodes.

multi‑state (master/slave) : special clone where instances have master‑slave relationships.

Resource agents (RA) categories:

LSB scripts located in /etc/rc.d/init.d/*, supporting start/stop/restart/reload/status/force‑reload (cannot be enabled for auto‑boot).

OCF (Open Cluster Framework) agents in /usr/lib/ocf/resource.d/provider/, supporting start/stop/status/monitor/meta‑data.

STONITH agents for fencing, and systemd units in /usr/lib/systemd/system/ (must be enabled for auto‑boot).

Constraint Types

Location constraints – define node preference for a resource.

Order constraints – define whether resources may run on the same node.

Sequence constraints – define start‑up ordering dependencies.

Constraint diagram
Constraint diagram

HA Working Models

A/P (active/passive) – two‑node primary‑backup.

A/A (active/active) – two‑node primary‑primary.

N‑M – N nodes provide M services; N‑M nodes act as standby.

Split‑Brain Isolation

When a cluster splits and a partition holds less than half of the votes, isolation mechanisms are used:

STONITH – node‑level fencing by power‑off or reboot.

Fencing – resource‑level isolation via network switches.

Installation and Configuration

Prerequisites: hostname resolution between nodes and synchronized time.

Install Pacemaker (CentOS 7): yum -y install pacemaker Corosync configuration sections include totem , logging , quorum , and nodelist . After editing, generate the authentication key with corosync‑keygen -l and copy the config and key to each node.

Start services:

systemctl start corosync
systemctl start pacemaker

Install crmsh tools for resource management:

yum -y install crmsh-2.1.4-1.1.x86_64.rpm pssh-2.3.1-4.2.x86_64.rpm python-pssh-2.3.1-4.2.x86_64.rpm

High‑Availability Example: NFS with Corosync + Pacemaker

Deploy an NFS service on a separate server, mount the same directory on both cluster nodes, and configure resources so that when one node is set to standby, the NFS resource automatically moves to the other node.

Experiment test diagram
Experiment test diagram
Resource migration diagram
Resource migration diagram

Resource Monitoring

Define a monitor for the httpd service: if httpd stops, it is restarted; if restart fails, the resource is migrated to a healthy node.

Monitoring diagram
Monitoring diagram

Conclusion

Corosync + Pacemaker provides a slightly more complex HA solution than LVS, but it also supports health checks for resources and can integrate with tools like ldirectord to generate IPVS rules automatically.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

HACorosyncPacemaker
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.