Databases 24 min read

Master PostgreSQL 17: Installation, Backup, Recovery, and Performance Tuning

This comprehensive guide walks you through PostgreSQL 17 deployment, explains its multi‑process architecture and MVCC model, details environment requirements, shows essential configuration parameters, provides step‑by‑step backup and PITR procedures, demonstrates streaming replication setup, and shares best‑practice tuning, security, and monitoring tips for reliable production use.

Ops Community
Ops Community
Ops Community
Master PostgreSQL 17: Installation, Backup, Recovery, and Performance Tuning

Overview

PostgreSQL 17 is a feature‑complete open‑source relational database. It differs from MySQL in three core areas: a multi‑process architecture, MVCC that stores old row versions in the heap, and a Write‑Ahead Log (WAL) that underpins crash recovery, streaming replication and logical replication.

Key Characteristics

Multi‑process architecture : each client connection spawns an independent backend process (≈5‑10 MiB). High connection counts increase memory usage, so a connection pool is mandatory.

MVCC : UPDATE inserts a new row version and marks the old one as a dead tuple; DELETE only marks rows as dead. Accumulated dead tuples cause table/index bloat and Transaction ID wraparound.

WAL : sequential log written before data pages; required for crash recovery, streaming replication and logical replication.

Environment Requirements

PostgreSQL 17.x on Ubuntu 22.04 or RHEL 9 (LTS recommended).

Memory ≥ 16 GB – set shared_buffers to ~25 % of RAM.

SSD storage with separate disks for data and WAL to minimise I/O latency.

Core Architecture Details

The main postmaster process forks the following children:

postmaster (main process)
├── backend process × N      # one per client connection
├── autovacuum launcher      # schedules VACUUM/ANALYZE
├── autovacuum worker × N    # performs VACUUM/ANALYZE
├── WAL writer               # flushes WAL buffer to disk
├── checkpointer             # writes dirty pages to data files
├── background writer        # writes dirty pages asynchronously
└── stats collector          # gathers statistics

Typical memory per backend is 5‑10 MiB; 100 connections therefore consume 500 MiB‑1 GiB, which explains why max_connections must be tuned carefully.

MVCC and Dead Tuples

Dead tuples are removed by VACUUM. VACUUM FULL rewrites the whole table and requires an exclusive lock, so it should be used only when necessary.

Essential Configuration Parameters

Memory‑related Settings ( /etc/postgresql/17/main/postgresql.conf )

# shared_buffers – 25 % of physical RAM
shared_buffers = 4GB

# effective_cache_size – 75 % of RAM (shared_buffers + OS cache)
effective_cache_size = 12GB

# work_mem – memory per sort/hash operation
work_mem = 64MB

# maintenance_work_mem – memory for VACUUM, CREATE INDEX, etc.
maintenance_work_mem = 512MB

# wal_buffers – usually -1 (auto, 1/32 of shared_buffers, max 64MB)
wal_buffers = 64MB

Connection and Concurrency

# Maximum client connections (use a connection pool)
max_connections = 200
# Reserved connections for superuser to avoid lock‑out
superuser_reserved_connections = 5

WAL and Replication Settings

# WAL level – replica enables streaming replication (default)
wal_level = replica
# Checkpoint completion target – smooth I/O
checkpoint_completion_target = 0.9
# WAL size limits
max_wal_size = 4GB
min_wal_size = 1GB
# Synchronous commit – on for safety, off for performance (30‑50 % gain)
synchronous_commit = on

Autovacuum Configuration

# Enable autovacuum (default)
autovacuum = on
# Number of autovacuum workers
autovacuum_max_workers = 5
# Vacuum trigger thresholds
autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.02   # 2 % of table rows
# Analyze trigger thresholds
autovacuum_analyze_threshold = 50
autovacuum_analyze_scale_factor = 0.01   # 1 %
# Cost delay (ms) – throttle I/O pressure
autovacuum_vacuum_cost_delay = 2ms

For very large tables (hundreds of millions of rows) lower autovacuum_vacuum_scale_factor (e.g., 0.005) to avoid bloat:

ALTER TABLE orders SET (
    autovacuum_vacuum_scale_factor = 0.005,
    autovacuum_analyze_scale_factor = 0.002,
    autovacuum_vacuum_cost_delay = 5
);

Backup and Recovery Strategies

Logical Backup with pg_dump

# Custom‑format dump with compression level 5 and 4 parallel jobs
pg_dump -h 127.0.0.1 -U postgres -d mydb \
    -F c -Z 5 -j 4 -f /backup/mydb_$(date +%Y%m%d).dump

# Restore (parallel)
pg_restore -h 127.0.0.1 -U postgres -d mydb_restore -j 4 -F c /backup/mydb_20260101.dump

# Schema‑only backup
pg_dump -h 127.0.0.1 -U postgres -d mydb -s -f schema.sql

# Backup specific tables
pg_dump -h 127.0.0.1 -U postgres -d mydb -t orders -t order_items -F c -f orders_backup.dump

Physical Backup with pg_basebackup

# Full physical backup (tar format, gzip compressed)
pg_basebackup -h 127.0.0.1 -U replicator \
    -D /backup/base_$(date +%Y%m%d) \
    -F tar -z -P -Xs --checkpoint=fast

Point‑In‑Time Recovery (PITR)

# Enable WAL archiving on the primary
archive_mode = on
archive_command = 'cp %p /archive/%f'   # production: push to object storage

# Recovery target on standby
restore_command = 'cp /archive/%f %p'
recovery_target_time = '2026-01-15 14:30:00'
recovery_target_action = 'promote'

Streaming Replication Configuration

Primary Server

# postgresql.conf additions
wal_level = replica
max_wal_senders = 5
wal_keep_size = 1GB
hot_standby = on

# pg_hba.conf – allow replication connections
host    replication    replicator    192.168.1.0/24    scram-sha-256
# Create replication user
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'ReplicaPass123';

Standby Server

# Initialise standby (creates standby.signal automatically)
pg_basebackup -h 192.168.1.10 -U replicator \
    -D /var/lib/postgresql/17/main -P -Xs -R

# Verify generated connection info
cat /var/lib/postgresql/17/main/postgresql.auto.conf
# primary_conninfo = 'host=192.168.1.10 port=5432 user=replicator password=ReplicaPass123'

# Start standby
systemctl start postgresql@17-main

# Check replication status from primary
psql -U postgres -c "SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, sync_state FROM pg_stat_replication;"

Performance Optimisation

Connection Pooling

PgBouncer in transaction mode is the production standard. Example pgbouncer.ini:

[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20
min_pool_size = 5
reserve_pool_size = 5
server_idle_timeout = 600

Index Strategy

# Composite index – equality columns first, range columns later
CREATE INDEX idx_orders_user_status_created ON orders (user_id, status, created_at DESC);

# Partial index for pending orders only
CREATE INDEX idx_orders_pending ON orders (created_at DESC) WHERE status = 'pending';

# Create index concurrently to avoid locking
CREATE INDEX CONCURRENTLY idx_orders_new ON orders (user_id);

Statistics Maintenance

# Manual ANALYZE (non‑blocking)
ANALYZE orders;

# View statistics freshness
SELECT schemaname, tablename, last_vacuum, last_autovacuum,
       last_analyze, last_autoanalyze, n_dead_tup, n_live_tup
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

Security Hardening

# Principle of least privilege – read‑only role
CREATE ROLE app_readonly;
GRANT CONNECT ON DATABASE mydb TO app_readonly;
GRANT USAGE ON SCHEMA public TO app_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO app_readonly;

# Read‑write role inherits read‑only and gets DML rights
CREATE ROLE app_readwrite;
GRANT app_readonly TO app_readwrite;
GRANT INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app_readwrite;
GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA public TO app_readwrite;

# pg_hba.conf – enforce scram‑sha‑256, reject all other hosts
local   all             postgres                                peer
local   all             all                                     scram-sha-256
host    all             all             127.0.0.1/32          scram-sha-256
host    all             all             192.168.1.0/24        scram-sha-256
host    all             all             0.0.0.0/0             reject

High‑Availability with Patroni

# patroni.yml (core excerpt)
scope: postgres-cluster
name: pg-node1

restapi:
  listen: 0.0.0.0:8008
  connect_address: 192.168.1.10:8008

etcd3:
  hosts: 192.168.1.20:2379,192.168.1.21:2379,192.168.1.22:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576  # 1 MiB

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 192.168.1.10:5432
  parameters:
    max_connections: 200
    shared_buffers: 4GB
    wal_level: replica
    hot_standby: on

Common Pitfalls and Alerts

FATAL: remaining connection slots are reserved

– max_connections reached; check for leaks and deploy PgBouncer. ERROR: could not serialize access due to concurrent update – serialization conflict; add retry logic or lower isolation level.

WARNING: database "mydb" must be vacuumed within N transactions

– approaching Transaction ID wraparound; run VACUUM FREEZE immediately.

Sudden query slowdown – stale statistics; run ANALYZE or temporarily disable sequential scans.

Continuous tablespace growth – autovacuum not keeping up; tune autovacuum_vacuum_scale_factor and related settings.

Monitoring and Alerting

Key Metrics Query

psql -U postgres -c "SELECT datname,
       numbackends AS connections,
       xact_commit AS commits,
       xact_rollback AS rollbacks,
       blks_hit AS cache_hits,
       blks_read AS disk_reads,
       round(blks_hit::numeric / NULLIF(blks_hit + blks_read,0) * 100,2) AS cache_hit_ratio,
       deadlocks,
       temp_files,
       pg_size_pretty(temp_bytes) AS temp_size
FROM pg_stat_database
WHERE datname NOT IN ('template0','template1','postgres');"

Typical thresholds for a 16 GB server: cache hit > 99 %, connections < 80 % of max_connections, deadlocks = 0, replication lag < 1 MB, dead‑tuple ratio < 10 %.

Prometheus Alert Rules (postgres_alerts.yml)

groups:
- name: postgresql
  rules:
  - alert: PostgreSQLCacheHitRateLow
    expr: rate(pg_stat_database_blks_hit[5m]) / (rate(pg_stat_database_blks_hit[5m]) + rate(pg_stat_database_blks_read[5m])) < 0.95
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "PostgreSQL cache hit rate below 95%: {{ $value | humanizePercentage }}"

  - alert: PostgreSQLReplicationLag
    expr: pg_replication_lag > 300
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "PostgreSQL replication lag exceeds 5 minutes: {{ $value }}s"

  - alert: PostgreSQLDeadlocks
    expr: rate(pg_stat_database_deadlocks[5m]) > 0
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "PostgreSQL detected deadlocks"

Backup Automation Script

#!/bin/bash
# pg_backup.sh – daily full physical backup + WAL archiving
PGHOST="127.0.0.1"
PGUSER="postgres"
BACKUP_DIR="/data/backup/postgresql"
DATE=$(date +%Y%m%d_%H%M%S)
RETENTION_DAYS=7

mkdir -p "$BACKUP_DIR"

# Full physical backup using pg_basebackup
pg_basebackup -h "$PGHOST" -U replicator \
    -D "$BACKUP_DIR/base_$DATE" \
    -F tar -z -P -Xs \
    --checkpoint=fast

if [ $? -eq 0 ]; then
  echo "Backup succeeded: $BACKUP_DIR/base_$DATE"
  # Delete backups older than retention period
  find "$BACKUP_DIR" -maxdepth 1 -name "base_*" -mtime +$RETENTION_DAYS -exec rm -rf {} \;
else
  echo "Backup failed!" >&2
  exit 1
fi

Manual Recovery Procedure

Stop PostgreSQL: systemctl stop postgresql@17-main Remove existing data directory: rm -rf /var/lib/postgresql/17/main/* Extract the physical backup:

tar -xzf base_20260101_120000/base.tar.gz -C /var/lib/postgresql/17/main/

Configure restore_command and recovery_target_time in postgresql.conf.

Create the signal file: touch /var/lib/postgresql/17/main/recovery.signal Start PostgreSQL and monitor logs for successful recovery.

Key Takeaways

Connection pooling is mandatory; PgBouncer in transaction mode is the production standard.

Regular VACUUM and ANALYZE prevent table bloat and Transaction ID wraparound.

Set shared_buffers to ~25 % of RAM and effective_cache_size to ~75 % to give the optimizer accurate cost estimates.

Streaming replication combined with Patroni provides robust HA with automatic failover.

Further Learning

Logical replication and Change Data Capture (CDC) – enable wal_level = logical and use tools such as Debezium to stream changes to Kafka.

Declarative partitioning (range, list, hash) – consider pg_partman for automated partition lifecycle management.

PostGIS extension – spatial queries and indexing for GIS workloads.

Performance TuningreplicationPostgreSQLBackupDatabase AdministrationHA
Ops Community
Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.