Master PostgreSQL 17: Installation, Backup, Recovery, and Performance Tuning
This comprehensive guide walks you through PostgreSQL 17 deployment, explains its multi‑process architecture and MVCC model, details environment requirements, shows essential configuration parameters, provides step‑by‑step backup and PITR procedures, demonstrates streaming replication setup, and shares best‑practice tuning, security, and monitoring tips for reliable production use.
Overview
PostgreSQL 17 is a feature‑complete open‑source relational database. It differs from MySQL in three core areas: a multi‑process architecture, MVCC that stores old row versions in the heap, and a Write‑Ahead Log (WAL) that underpins crash recovery, streaming replication and logical replication.
Key Characteristics
Multi‑process architecture : each client connection spawns an independent backend process (≈5‑10 MiB). High connection counts increase memory usage, so a connection pool is mandatory.
MVCC : UPDATE inserts a new row version and marks the old one as a dead tuple; DELETE only marks rows as dead. Accumulated dead tuples cause table/index bloat and Transaction ID wraparound.
WAL : sequential log written before data pages; required for crash recovery, streaming replication and logical replication.
Environment Requirements
PostgreSQL 17.x on Ubuntu 22.04 or RHEL 9 (LTS recommended).
Memory ≥ 16 GB – set shared_buffers to ~25 % of RAM.
SSD storage with separate disks for data and WAL to minimise I/O latency.
Core Architecture Details
The main postmaster process forks the following children:
postmaster (main process)
├── backend process × N # one per client connection
├── autovacuum launcher # schedules VACUUM/ANALYZE
├── autovacuum worker × N # performs VACUUM/ANALYZE
├── WAL writer # flushes WAL buffer to disk
├── checkpointer # writes dirty pages to data files
├── background writer # writes dirty pages asynchronously
└── stats collector # gathers statisticsTypical memory per backend is 5‑10 MiB; 100 connections therefore consume 500 MiB‑1 GiB, which explains why max_connections must be tuned carefully.
MVCC and Dead Tuples
Dead tuples are removed by VACUUM. VACUUM FULL rewrites the whole table and requires an exclusive lock, so it should be used only when necessary.
Essential Configuration Parameters
Memory‑related Settings ( /etc/postgresql/17/main/postgresql.conf )
# shared_buffers – 25 % of physical RAM
shared_buffers = 4GB
# effective_cache_size – 75 % of RAM (shared_buffers + OS cache)
effective_cache_size = 12GB
# work_mem – memory per sort/hash operation
work_mem = 64MB
# maintenance_work_mem – memory for VACUUM, CREATE INDEX, etc.
maintenance_work_mem = 512MB
# wal_buffers – usually -1 (auto, 1/32 of shared_buffers, max 64MB)
wal_buffers = 64MBConnection and Concurrency
# Maximum client connections (use a connection pool)
max_connections = 200
# Reserved connections for superuser to avoid lock‑out
superuser_reserved_connections = 5WAL and Replication Settings
# WAL level – replica enables streaming replication (default)
wal_level = replica
# Checkpoint completion target – smooth I/O
checkpoint_completion_target = 0.9
# WAL size limits
max_wal_size = 4GB
min_wal_size = 1GB
# Synchronous commit – on for safety, off for performance (30‑50 % gain)
synchronous_commit = onAutovacuum Configuration
# Enable autovacuum (default)
autovacuum = on
# Number of autovacuum workers
autovacuum_max_workers = 5
# Vacuum trigger thresholds
autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.02 # 2 % of table rows
# Analyze trigger thresholds
autovacuum_analyze_threshold = 50
autovacuum_analyze_scale_factor = 0.01 # 1 %
# Cost delay (ms) – throttle I/O pressure
autovacuum_vacuum_cost_delay = 2msFor very large tables (hundreds of millions of rows) lower autovacuum_vacuum_scale_factor (e.g., 0.005) to avoid bloat:
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.005,
autovacuum_analyze_scale_factor = 0.002,
autovacuum_vacuum_cost_delay = 5
);Backup and Recovery Strategies
Logical Backup with pg_dump
# Custom‑format dump with compression level 5 and 4 parallel jobs
pg_dump -h 127.0.0.1 -U postgres -d mydb \
-F c -Z 5 -j 4 -f /backup/mydb_$(date +%Y%m%d).dump
# Restore (parallel)
pg_restore -h 127.0.0.1 -U postgres -d mydb_restore -j 4 -F c /backup/mydb_20260101.dump
# Schema‑only backup
pg_dump -h 127.0.0.1 -U postgres -d mydb -s -f schema.sql
# Backup specific tables
pg_dump -h 127.0.0.1 -U postgres -d mydb -t orders -t order_items -F c -f orders_backup.dumpPhysical Backup with pg_basebackup
# Full physical backup (tar format, gzip compressed)
pg_basebackup -h 127.0.0.1 -U replicator \
-D /backup/base_$(date +%Y%m%d) \
-F tar -z -P -Xs --checkpoint=fastPoint‑In‑Time Recovery (PITR)
# Enable WAL archiving on the primary
archive_mode = on
archive_command = 'cp %p /archive/%f' # production: push to object storage
# Recovery target on standby
restore_command = 'cp /archive/%f %p'
recovery_target_time = '2026-01-15 14:30:00'
recovery_target_action = 'promote'Streaming Replication Configuration
Primary Server
# postgresql.conf additions
wal_level = replica
max_wal_senders = 5
wal_keep_size = 1GB
hot_standby = on
# pg_hba.conf – allow replication connections
host replication replicator 192.168.1.0/24 scram-sha-256 # Create replication user
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'ReplicaPass123';Standby Server
# Initialise standby (creates standby.signal automatically)
pg_basebackup -h 192.168.1.10 -U replicator \
-D /var/lib/postgresql/17/main -P -Xs -R
# Verify generated connection info
cat /var/lib/postgresql/17/main/postgresql.auto.conf
# primary_conninfo = 'host=192.168.1.10 port=5432 user=replicator password=ReplicaPass123'
# Start standby
systemctl start postgresql@17-main
# Check replication status from primary
psql -U postgres -c "SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, sync_state FROM pg_stat_replication;"Performance Optimisation
Connection Pooling
PgBouncer in transaction mode is the production standard. Example pgbouncer.ini:
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20
min_pool_size = 5
reserve_pool_size = 5
server_idle_timeout = 600Index Strategy
# Composite index – equality columns first, range columns later
CREATE INDEX idx_orders_user_status_created ON orders (user_id, status, created_at DESC);
# Partial index for pending orders only
CREATE INDEX idx_orders_pending ON orders (created_at DESC) WHERE status = 'pending';
# Create index concurrently to avoid locking
CREATE INDEX CONCURRENTLY idx_orders_new ON orders (user_id);Statistics Maintenance
# Manual ANALYZE (non‑blocking)
ANALYZE orders;
# View statistics freshness
SELECT schemaname, tablename, last_vacuum, last_autovacuum,
last_analyze, last_autoanalyze, n_dead_tup, n_live_tup
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;Security Hardening
# Principle of least privilege – read‑only role
CREATE ROLE app_readonly;
GRANT CONNECT ON DATABASE mydb TO app_readonly;
GRANT USAGE ON SCHEMA public TO app_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO app_readonly;
# Read‑write role inherits read‑only and gets DML rights
CREATE ROLE app_readwrite;
GRANT app_readonly TO app_readwrite;
GRANT INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app_readwrite;
GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA public TO app_readwrite;
# pg_hba.conf – enforce scram‑sha‑256, reject all other hosts
local all postgres peer
local all all scram-sha-256
host all all 127.0.0.1/32 scram-sha-256
host all all 192.168.1.0/24 scram-sha-256
host all all 0.0.0.0/0 rejectHigh‑Availability with Patroni
# patroni.yml (core excerpt)
scope: postgres-cluster
name: pg-node1
restapi:
listen: 0.0.0.0:8008
connect_address: 192.168.1.10:8008
etcd3:
hosts: 192.168.1.20:2379,192.168.1.21:2379,192.168.1.22:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576 # 1 MiB
postgresql:
listen: 0.0.0.0:5432
connect_address: 192.168.1.10:5432
parameters:
max_connections: 200
shared_buffers: 4GB
wal_level: replica
hot_standby: onCommon Pitfalls and Alerts
FATAL: remaining connection slots are reserved– max_connections reached; check for leaks and deploy PgBouncer. ERROR: could not serialize access due to concurrent update – serialization conflict; add retry logic or lower isolation level.
WARNING: database "mydb" must be vacuumed within N transactions– approaching Transaction ID wraparound; run VACUUM FREEZE immediately.
Sudden query slowdown – stale statistics; run ANALYZE or temporarily disable sequential scans.
Continuous tablespace growth – autovacuum not keeping up; tune autovacuum_vacuum_scale_factor and related settings.
Monitoring and Alerting
Key Metrics Query
psql -U postgres -c "SELECT datname,
numbackends AS connections,
xact_commit AS commits,
xact_rollback AS rollbacks,
blks_hit AS cache_hits,
blks_read AS disk_reads,
round(blks_hit::numeric / NULLIF(blks_hit + blks_read,0) * 100,2) AS cache_hit_ratio,
deadlocks,
temp_files,
pg_size_pretty(temp_bytes) AS temp_size
FROM pg_stat_database
WHERE datname NOT IN ('template0','template1','postgres');"Typical thresholds for a 16 GB server: cache hit > 99 %, connections < 80 % of max_connections, deadlocks = 0, replication lag < 1 MB, dead‑tuple ratio < 10 %.
Prometheus Alert Rules (postgres_alerts.yml)
groups:
- name: postgresql
rules:
- alert: PostgreSQLCacheHitRateLow
expr: rate(pg_stat_database_blks_hit[5m]) / (rate(pg_stat_database_blks_hit[5m]) + rate(pg_stat_database_blks_read[5m])) < 0.95
for: 10m
labels:
severity: warning
annotations:
summary: "PostgreSQL cache hit rate below 95%: {{ $value | humanizePercentage }}"
- alert: PostgreSQLReplicationLag
expr: pg_replication_lag > 300
for: 2m
labels:
severity: critical
annotations:
summary: "PostgreSQL replication lag exceeds 5 minutes: {{ $value }}s"
- alert: PostgreSQLDeadlocks
expr: rate(pg_stat_database_deadlocks[5m]) > 0
for: 1m
labels:
severity: warning
annotations:
summary: "PostgreSQL detected deadlocks"Backup Automation Script
#!/bin/bash
# pg_backup.sh – daily full physical backup + WAL archiving
PGHOST="127.0.0.1"
PGUSER="postgres"
BACKUP_DIR="/data/backup/postgresql"
DATE=$(date +%Y%m%d_%H%M%S)
RETENTION_DAYS=7
mkdir -p "$BACKUP_DIR"
# Full physical backup using pg_basebackup
pg_basebackup -h "$PGHOST" -U replicator \
-D "$BACKUP_DIR/base_$DATE" \
-F tar -z -P -Xs \
--checkpoint=fast
if [ $? -eq 0 ]; then
echo "Backup succeeded: $BACKUP_DIR/base_$DATE"
# Delete backups older than retention period
find "$BACKUP_DIR" -maxdepth 1 -name "base_*" -mtime +$RETENTION_DAYS -exec rm -rf {} \;
else
echo "Backup failed!" >&2
exit 1
fiManual Recovery Procedure
Stop PostgreSQL: systemctl stop postgresql@17-main Remove existing data directory: rm -rf /var/lib/postgresql/17/main/* Extract the physical backup:
tar -xzf base_20260101_120000/base.tar.gz -C /var/lib/postgresql/17/main/Configure restore_command and recovery_target_time in postgresql.conf.
Create the signal file: touch /var/lib/postgresql/17/main/recovery.signal Start PostgreSQL and monitor logs for successful recovery.
Key Takeaways
Connection pooling is mandatory; PgBouncer in transaction mode is the production standard.
Regular VACUUM and ANALYZE prevent table bloat and Transaction ID wraparound.
Set shared_buffers to ~25 % of RAM and effective_cache_size to ~75 % to give the optimizer accurate cost estimates.
Streaming replication combined with Patroni provides robust HA with automatic failover.
Further Learning
Logical replication and Change Data Capture (CDC) – enable wal_level = logical and use tools such as Debezium to stream changes to Kafka.
Declarative partitioning (range, list, hash) – consider pg_partman for automated partition lifecycle management.
PostGIS extension – spatial queries and indexing for GIS workloads.
Ops Community
A leading IT operations community where professionals share and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
