Mastering PostgreSQL Backup & Replication: A Complete Enterprise Guide
An in‑depth enterprise guide explains why backup and replication are critical for PostgreSQL, compares physical, logical, and logical replication methods, provides step‑by‑step command examples, outlines high‑availability architectures, automation scripts, disaster‑recovery procedures, monitoring queries, and common pitfalls to ensure robust data protection.
Why Backup and Replication Are Essential for PostgreSQL
Database failures such as accidental table deletions, disk crashes, network partitions, or failed upgrades can cause data loss or prolonged outages. Regular backups, recovery drills, and replication are required to achieve reliability and high availability.
Three Core PostgreSQL Data‑Protection Technologies
Physical Backup
Physical backup copies the entire PGDATA directory and relies on continuous WAL archiving for point‑in‑time recovery (PITR).
Copy the binary files of PGDATA directly.
Enable continuous WAL archiving.
Typical command: pg_basebackup with streaming WAL.
# Full physical backup compressed as tar.gz
pg_basebackup \
-h 10.10.10.1 -p 5432 \
-U backup_user \
-D /data/backup/full_$(date +%F) \
-Ft -z -P --wal-method=streamKey postgresql.conf settings for WAL archiving:
archive_mode = on
archive_command = 'cp %p /data/wal_archive/%f'
wal_level = replica
max_wal_senders = 10
wal_keep_size = 4GBLogical Backup
Logical backup extracts data at the database, schema, table, or DDL level, allowing fine‑grained restores.
Export a single database: pg_dump -d mydb -Fc -f mydb_$(date +%F).dump Parallel dump for large databases: pg_dump -d mydb -Fd -j 8 -f backup_dir_$(date +%F) Export all databases and roles:
pg_dumpall -U postgres > all_db_$(date +%F).sqlLogical Replication
Logical replication streams changes from a publisher to one or more subscribers.
Publisher → WAL logical decoding → Logical change → SubscriberTypical setup:
# 1. Create a replication role on the primary
CREATE ROLE repl_user WITH REPLICATION LOGIN PASSWORD 'repl123';
# 2. Create a publication for selected tables
CREATE PUBLICATION pub_sales FOR TABLE sales, orders;
# 3. Create a subscription on the replica
CREATE SUBSCRIPTION sub_sales
CONNECTION 'host=10.10.10.1 port=5432 dbname=mydb user=repl_user password=repl123'
PUBLICATION pub_sales;Choosing the Appropriate Technique
Full‑cluster disaster recovery → Physical backup + WAL archiving.
Single‑table accidental deletion → Logical backup.
Multi‑region traffic distribution → Logical replication.
Minimize data loss (RPO≈0) → Physical backup with continuous WAL.
Prevent accidental changes from propagating to all replicas → Delayed physical replication (hot‑standby feedback).
Typical Enterprise High‑Availability Architecture
Practical Enterprise Implementation
Hybrid Backup Strategy (YAML)
backup_strategy:
physical:
full_backup: "daily 02:00"
retention: "30 days"
wal:
enabled: true
retention: "90 days"
logical:
full_backup: "Sundays 01:00"
retention: "12 months"
replication:
sync_node: 1
async_read_only: 1
delayed_node:
delay: 1hAutomated Backup Script
#!/bin/bash
BACKUP_DIR="/backup"
DATE=$(date +%Y%m%d_%H%M%S)
# Physical backup
pg_basebackup -D ${BACKUP_DIR}/physical/${DATE} \
-Ft -z -P --wal-method=fetch
# Logical backup
pg_dump -d mydb -Fd -j 4 -f ${BACKUP_DIR}/logical/${DATE}
# Verify backup integrity
pg_verifybackup ${BACKUP_DIR}/physical/${DATE}
# Clean up old backups (keep 30 days)
find ${BACKUP_DIR}/physical -mtime +30 -exec rm -rf {} \;Disaster Recovery & Point‑In‑Time Recovery (PITR)
Recovering after an accidental DROP TABLE orders; operation:
# 1. Stop the database
pg_ctl stop -D /var/lib/pgsql/data
# 2. Restore the physical backup
rm -rf /var/lib/pgsql/data/*
tar -xf /backup/full_2024-12-01.tar.gz -C /var/lib/pgsql/data
# 3. Create recovery configuration for the target time
echo "restore_command = 'cp /data/wal_archive/%f %p'" >> postgresql.conf
echo "recovery_target_time = '2024-12-20 14:02:00'" >> /var/lib/pgsql/data/recovery.conf
# 4. Start the database – it will roll forward to the point before the DROP
pg_ctl start -D /var/lib/pgsql/dataReplication Monitoring & Lag Diagnosis
SELECT
client_addr,
state,
write_lag,
flush_lag,
replay_lag,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS bytes_delay
FROM pg_stat_replication;If replication lag exceeds 1 GB, consider expanding network bandwidth, increasing max_wal_senders, or adding additional archive storage.
Common Pitfalls and Fixes
Relying only on logical backups – may not support full cluster recovery. Fix: Combine physical and logical backups.
Not validating backups – restores can fail. Fix: Perform quarterly restore drills.
WAL archive disk full – primary stops writing and may crash. Fix: Implement automatic cleanup and off‑site cold storage (e.g., S3).
Key Takeaway Formula
Enterprise‑grade data safety =
Physical backup +
WAL archiving +
Logical backup +
Real‑time replication +
Delayed replica +
Regular recovery drillsSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ray's Galactic Tech
Practice together, never alone. We cover programming languages, development tools, learning methods, and pitfall notes. We simplify complex topics, guiding you from beginner to advanced. Weekly practical content—let's grow together!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
