Databases 10 min read

Build a Two‑Node PostgreSQL HA Cluster with Streaming Replication and Dual VIPs

This guide explains how to create a high‑availability PostgreSQL cluster on two hosts using streaming replication, automatic failover and failback, dual virtual IPs, heartbeat monitoring, and stress testing, with full configuration scripts and practical test results.

dbaplus Community
dbaplus Community
dbaplus Community
Build a Two‑Node PostgreSQL HA Cluster with Streaming Replication and Dual VIPs

Introduction

This guide describes a hands‑on implementation of a PostgreSQL high‑availability (HA) cluster using streaming replication on two physical hosts: one primary and one standby. The solution provides automatic failover and failback, with each node owning a dedicated virtual IP (VIP).

Prerequisites

Two Linux servers (primary and standby).

Two fence devices to avoid split‑brain during role switches.

Two VIPs, each permanently associated with a role (primary‑VIP, standby‑VIP).

Shared archive directory (e.g., NFS) for WAL files.

Architecture Overview

Each node runs PostgreSQL and a monitoring script. The node’s current role determines which VIP it activates. Applications connect to the VIPs, so they always reach the current primary regardless of which physical host holds it.

Role Detection and Startup

The script determines the node’s role by checking for recovery.conf:

If the file exists → standby.

If the file does not exist → primary.

After role detection the script mounts the shared archive, starts PostgreSQL, and activates the appropriate VIP(s).

VIP Activation Logic

Standby node : simply brings up its assigned VIP.

Primary node : checks whether the peer already owns the VIP. If the peer does, the script fences the peer, takes over the VIP, and proceeds; otherwise it brings up its own VIP directly.

Heartbeat and Monitoring Loop

Both nodes run a continuous heartbeat loop. The logic differs by role:

Primary loop checks network gateway, local PostgreSQL health, VIP ownership, and remote node health. If the local node is healthy while the remote node is not, the script fences the standby, acquires its VIPs, and switches to the primary_standby state.

Standby loop performs the same checks plus replication‑lag evaluation. When the remote primary is down and lag is acceptable, the standby fences the primary, stops PostgreSQL, backs up control files, disables restore_command, restarts PostgreSQL, activates its VIPs, and promotes itself to master_standby.

Primary‑standby loop monitors the peer’s listener. When the peer comes back online, the node releases its VIPs and reverts to the normal primary role.

Implementation Repository

Complete scripts are available at:

https://github.com/digoal/PostgreSQL_HA_with_primary_standby_2vip (two‑node HA with dual VIPs)

https://github.com/digoal/sky_postgresql_cluster (single‑VIP master‑only version)

Stress and Consistency Test

A test table and PL/pgSQL function generate write load:

CREATE TABLE test(id int primary key, info text, crt_time timestamp);

CREATE OR REPLACE FUNCTION f_test(i_id int) RETURNS void AS $$
DECLARE
BEGIN
    UPDATE test SET info = md5(random()::text), crt_time = now() WHERE id = i_id;
    IF NOT FOUND THEN
        INSERT INTO test VALUES(i_id, md5(random()::text), now());
    END IF;
EXCEPTION WHEN OTHERS THEN
    RETURN;
END;
$$ LANGUAGE plpgsql STRICT;

The load script ( test.sql) uses pgbench to call the function millions of times:

\setrandom id 1 30000000
SELECT f_test(:id);

Running pgbench -M prepared -n -r -f ./test.sql -c 16 -j 4 -T 3000 on the primary, then disabling the primary’s network interface, triggers the failover sequence. Log excerpts show detection, fencing, promotion, checkpoint creation, and VIP reassignment.

Result Verification

After failover, both nodes execute the same aggregate query to confirm data consistency: SELECT count(*), sum(hashtext(info)) FROM test; The returned count and hash sum match (e.g., 1808909 | -680268294581), demonstrating that no rows were lost during the transition.

References

https://github.com/digoal/PostgreSQL_HA_with_primary_standby_2vip

https://github.com/digoal/sky_postgresql_cluster

http://blog.163.com/digoal@126/blog/static/163877040201481085344535/

http://blog.163.com/digoal@126/blog/static/163877040201481085624211/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PostgreSQLFailoverHAVIPStreaming Replication
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.