PostgreSQL High‑Availability Cluster Deployment with Patroni and Etcd
This article details the design, deployment, configuration, operation, monitoring, and backup of a PostgreSQL high‑availability cluster built on Patroni, Etcd, and LVS at 360, covering hardware layout, software versions, installation steps, parameter tuning, fail‑over testing, and future outlook.
Background – An online container‑cloud image repository required a highly available PostgreSQL cluster. PostgreSQL was chosen over MySQL for its advanced indexing, full‑text search, open‑source license, and compatibility with kube‑bench.
Why Patroni + Etcd – Patroni and Etcd are among the most popular HA solutions on GitHub, support PostgreSQL 13, and provide robust leader election, distributed configuration, and automatic fail‑over.
Architecture – Patroni runs on each PostgreSQL node, writes node status to an Etcd key‑value store, and uses Etcd for leader election. The cluster uses one primary and two replicas across different IDC sites, with Etcd deployed as a three‑node cluster. LVS provides load‑balancing instead of HAProxy.
System / Software Versions
CentOS 7.4
PostgreSQL 12.6
Etcd 3.2.18
Python 3.6.5
Patroni 2.1.0
Host Layout
10.16.75.17 pg12/patroni
10.16.75.15 pg12/patroni
10.16.78.53 pg12/patroni
10.24.13.9 etcd
10.24.13.10 etcd
10.24.13.11 etcdPython Installation
yum install wget gcc make zlib-devel openssl openssl-devel
wget "https://www.python.org/ftp/python/3.6.5/Python-3.6.5.tar.xz"
tar -xvJf Python-3.6.5.tar.xz
cd Python-3.6.5 && ./configure prefix=/usr/local/python3
make && make install
ln -fs /usr/local/python3/bin/python3 /usr/bin/python3
ln -fs /usr/local/python3/bin/pip3 /usr/bin/pip3
pip3 install virtualenv -i https://mirrors.ustc.edu.cn/pypi/web/simple/
ln -fs /usr/local/python3/bin/virtualenv /usr/bin/virtualenvPostgreSQL Directory and User
useradd postgres
mkdir -p /data04/pg15432/data
mkdir -p /data04/pg15432/scripts
chown -R postgres:postgres /data04/pg15432/Etcd Deployment
yum install etcd -y
# edit /etc/etcd/etcd.conf
ETCD_DATA_DIR="/var/lib/etcd/codis.etcd"
ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"
ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
ETCD_NAME="node1"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.24.13.9:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://10.24.13.9:2379"
ETCD_INITIAL_CLUSTER="node1=http://10.24.13.9:2380,node2=http://10.24.13.10:2380,node3=http://10.24.13.11:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"
systemctl daemon-reload
systemctl enable etcd
systemctl start etcdPatroni Deployment
cd /data04 && virtualenv venv4archery --python=python3
source venv4archery/bin/activate
pip3 install psycopg2-binary -i https://mirrors.aliyun.com/pypi/simple/
pip3 install "patroni[etcd,consul,zookeeper,kubernetes]" -i https://mirrors.aliyun.com/pypi/simple/
# start as non‑root user
patroni /etc/patroni.yml > patroni_member_1.log 2>&1 &Systemd Service File (excerpt)
[Unit]
Description=Runners to orchestrate a high-availability PostgreSQL
After=syslog.target network.target
[Service]
Type=simple
User=postgres
Group=postgres
ExecStart=/sbin/patroni /etc/patroni.yml
ExecReload=/bin/kill -s HUP $MAINPID
KillMode=process
TimeoutSec=30
Restart=no
[Install]
WantedBy=multi-user.targetPatroni Configuration (/etc/patroni.yml)
scope: postgresql
namespace: /service/
name: postgresql1
restapi:
listen: 10.16.75.17:8008
connect_address: 10.16.75.17:8008
etcd:
- host: 10.24.13.9:2379
- host: 10.24.13.10:2379
- host: 10.24.13.11:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
listen_addresses: "0.0.0.0"
wal_level: logical
archive_mode: "on"
max_connections: 6000
shared_buffers: 32GB
archive_command: 'DATE=`date +%Y%m%d`;DIR="/data04/pg15432/arch/$DATE";(test -d $DIR || mkdir -p $DIR)&& cp %p $DIR/%f'
hot_standby: "on"
wal_keep_segments: 100
max_wal_senders: 10
max_replication_slots: 10
wal_log_hints: "on"
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication repl 127.0.0.1/32 md5
- host replication repl 10.16.75.17/0 md5
- host all all 0.0.0.0/0 md5
users:
admin:
password: admin
options:
- createrole
- createdb
postgresql:
listen: 10.16.75.17:15432
connect_address: 10.16.75.17:15432
bin_dir: /usr/local/pgsql/bin
data_dir: /data04/pg15432/data
pgpass: /tmp/pgpass1
authentication:
replication:
username: repl
password: "1a23s6c54f"
superuser:
username: postgres
password: "59687411134be622"
parameters:
unix_socket_directories: '.'
synchronous_commit: "on"
synchronous_standby_names: "*"
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: falseParameter Tuning (examples)
Increase shared_buffers to 32GB: patronictl -c /etc/patroni.yml edit-config -p "shared_buffers='32GB'"
Set max_connections to 6000: patronictl -c /etc/patroni.yml edit-config -p 'max_connections=6000'
Enable slow‑query logging (200 ms): patronictl -c /etc/patroni.yml edit-config -p 'log_min_duration_statement=200'
After editing, restart PostgreSQL with patronictl -c /etc/patroni.yml restart postgresql .
Monitoring – Grafana + Prometheus are used to collect PostgreSQL metrics; dashboards can be imported from the Grafana marketplace.
Backup – pg_rman is employed for physical backups, supporting full and incremental backups, catalog verification, and cleanup.
Fail‑over Tests – Killing the primary process shows Patroni automatically restarts it without switching. Manual switchover is performed with patronictl -d etcd://10.16.75.17:2379 switchover postgresql , and the cluster recovers correctly.
Future Outlook – PostgreSQL ranks fourth on DB‑Engines, offers powerful GIS capabilities, and its HA ecosystem continues to mature, suggesting broader adoption in China.
360 Tech Engineering
Official tech channel of 360, building the most professional technology aggregation platform for the brand.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.