Mastering rsync: From Basics to Advanced Incremental Backup Strategies
This comprehensive guide walks you through rsync’s core delta algorithm, essential options, and real‑world backup scenarios, then dives into advanced techniques like link‑dest incremental snapshots, SSH vs daemon modes, cron scheduling, inotify real‑time sync, bandwidth control, verification, monitoring, and security best practices.
Overview
Data loss caused by missing backups is a common risk for small businesses. rsync is the fundamental tool for Linux backup because it works out‑of‑the‑box, requires no extra agents or licenses, and transfers only the changed parts of files.
How rsync works
rsync uses a delta‑transfer algorithm:
Compute rolling checksums of the destination files.
Send the checksums to the source.
The source finds matching blocks and only sends the non‑matching (new or changed) blocks.
The destination reconstructs the new file from received blocks and existing data.
By default rsync decides whether a file needs to be transferred based on size and modification time. Adding --checksum forces a full content hash comparison (slower but more reliable).
Typical scenarios
Local backup (different disks on the same host)
Remote backup over SSH
Real‑time sync with inotify Disaster‑recovery across data centers
Configuration distribution
Website deployment
Environment requirements
OS: Ubuntu 24.04 LTS or Rocky Linux 9.x (both must have rsync installed)
rsync ≥ 3.3.0 (Ubuntu 24.04 ships 3.3.0, Rocky 9 ships 3.2.3)
OpenSSH 9.x for encrypted transport
Optional: inotify-tools ≥ 4.x for real‑time sync
Installation and basic usage
# Install rsync
sudo apt install -y rsync # Ubuntu/Debian
sudo dnf install -y rsync # Rocky/CentOS
rsync --version | head -1Basic commands:
# Local sync
rsync -av /home/user/data/ /backup/data/
# Remote push (SSH)
rsync -av /home/user/data/ user@backup-server:/backup/data/
# Remote pull (SSH)
rsync -av user@remote-server:/data/ /local/backup/Key options
-a(archive) = -rlptgoD: recursive, preserve links, permissions, timestamps, group, owner, devices. -v for verbose output.
Trailing slash matters: src/ copies the *contents* of src; src copies the directory itself. -n (dry‑run) to preview actions, especially with --delete.
Advanced parameters
Compression and transfer
# Compress over network
rsync -avz src/ remote:/dst/
# Adjust compression level (1‑9)
rsync -avz --compress-level=1 src/ remote:/dst/Deletion and exclusion
# Delete extraneous files on target
rsync -av --delete src/ dst/
# Safer: preview first
rsync -avn --delete src/ dst/
# Exclude patterns
rsync -av --exclude='*.log' --exclude='.git' src/ dst/Bandwidth, partials, and progress
# Limit bandwidth to 10 MB/s
rsync -av --bwlimit=10240 src/ dst/
# Show progress and keep partial files
rsync -avP src/ dst/Dry‑run and safety
# Always dry‑run before destructive ops
rsync -avn --delete src/ dst/SSH vs daemon mode
SSH mode uses the existing SSH infrastructure, encrypts traffic, and requires no extra service configuration. It is the default for most cross‑network backups.
Daemon mode runs a dedicated rsync service on port 873. It avoids encryption (higher raw throughput) but needs its own rsyncd.conf and authentication setup.
Incremental backup design
Use --link-dest to create snapshots that appear as full copies while sharing unchanged files via hard links. This provides space‑efficient, point‑in‑time backups.
# First full backup
rsync -a /var/www/html/ /backup/website/2023-01-01/
# Incremental backup based on previous snapshot
rsync -a --link-dest=/backup/website/2023-01-01/ /var/www/html/ /backup/website/2023-01-02/A typical retention policy keeps daily increments for a week, weekly full backups for a month, and then discards older snapshots.
Production‑grade backup script
#!/bin/bash
set -euo pipefail
# Configuration
SOURCE="/var/www/html/"
BACKUP_ROOT="/backup/website"
KEEP_DAYS=14
LOG_FILE="/var/log/rsync_backup.log"
LOCK_FILE="/var/run/rsync_backup.lock"
# Acquire lock to avoid concurrent runs
exec 200>"$LOCK_FILE"
if ! flock -n 200; then
echo "[WARN] Another backup is running, exiting" | tee -a "$LOG_FILE"
exit 0
fi
# Determine latest snapshot for link‑dest
LATEST_LINK="$BACKUP_ROOT/latest"
if [ -L "$LATEST_LINK" ] && [ -d "$(readlink -f "$LATEST_LINK")" ]; then
LINK_DEST=$(readlink -f "$LATEST_LINK")
RSYNC_OPTS="-a --delete --stats --link-dest=$LINK_DEST"
else
RSYNC_OPTS="-a --delete --stats"
fi
# Create new snapshot directory
DATE=$(date +%Y-%m-%d)
SNAP_DIR="$BACKUP_ROOT/$DATE"
mkdir -p "$SNAP_DIR"
# Run rsync
rsync $RSYNC_OPTS "$SOURCE" "$SNAP_DIR/" >> "$LOG_FILE" 2>&1
# Update latest symlink
ln -sfn "$SNAP_DIR" "$LATEST_LINK"
# Cleanup old snapshots
find "$BACKUP_ROOT" -maxdepth 1 -type d -name '20[0-9][0-9]-[0-9][0-9]-[0-9][0-9]' -mtime +$KEEP_DAYS -exec rm -rf {} +
echo "Backup completed: $SNAP_DIR" | tee -a "$LOG_FILE"Scheduling with cron and flock
# Edit crontab (run as root)
crontab -e
# Daily at 02:00 – production backup
0 2 * * * /usr/local/bin/production_backup.sh >> /var/log/cron_backup.log 2>&1
# Prevent overlapping runs (alternative inline method)
0 2 * * * flock -n /var/run/rsync_backup.lock /usr/local/bin/production_backup.sh >> /var/log/cron_backup.log 2>&1Real‑time sync with inotify
#!/bin/bash
# realtime_sync.sh – sync changes immediately
WATCH_DIR="/var/www/html/"
REMOTE="[email protected]:/backup/www/"
SSH_OPTS="-i /root/.ssh/sync_key -o StrictHostKeyChecking=accept-new"
while inotifywait -mrq -e create,delete,modify,move "$WATCH_DIR"; do
rsync -az -e "ssh $SSH_OPTS" "$WATCH_DIR" "$REMOTE"
doneDeploy the script as a systemd service for automatic start‑up and restart.
Best practices and cautions
Always pay attention to the trailing slash on the source path – it changes whether the directory itself or its contents are copied.
Run a dry‑run ( -n) before any --delete operation.
Use --link-dest for space‑efficient incremental snapshots.
Validate backups regularly (checksum comparison, random file checks).
Monitor disk usage; apply --bwlimit during business hours.
Lock scripts with flock to avoid concurrent executions.
Never run --delete on an empty source – add a sanity check for minimum file count.
Bandwidth control
# Example: limit to 10 MB/s during daytime
HOUR=$(date +%H)
if [ $HOUR -ge 8 ] && [ $HOUR -lt 22 ]; then
BW="--bwlimit=10240"
else
BW=""
fi
rsync -avz $BW /data/ remote:/backup/Security considerations
Using --delete can erase all target data if the source path is wrong or empty. Mitigate by:
Running a dry‑run first.
Checking that the source directory exists and contains at least a minimal number of files.
Optionally limiting deletions with --max-delete.
Backup verification
#!/bin/bash
SOURCE="/var/www/html/"
BACKUP="/backup/website/latest/"
# Dry‑run with checksum to compare source and backup
rsync -avn --checksum "$SOURCE" "$BACKUP" > /tmp/rsync_diff.txt
if [ $(wc -l < /tmp/rsync_diff.txt) -le 2 ]; then
echo "Verification passed: source and backup are identical"
else
echo "Verification failed: differences found"
head -20 /tmp/rsync_diff.txt
fiMonitoring and alerting
#!/bin/bash
# backup_monitor.sh – emits Prometheus metrics for each backup task
METRICS="/var/lib/prometheus/node-exporter/backup_status.prom"
TMP=$(mktemp)
cat > "$TMP" <<'EOF'
# HELP backup_last_success_timestamp Last successful backup timestamp
# TYPE backup_last_success_timestamp gauge
# HELP backup_size_bytes Backup size in bytes
# TYPE backup_size_bytes gauge
# HELP backup_age_seconds Age of the latest backup in seconds
# TYPE backup_age_seconds gauge
EOF
BACKUP_ROOT="/backup"
for d in "$BACKUP_ROOT"/*/; do
task=$(basename "$d")
latest="$d/latest"
if [ -L "$latest" ] && [ -d "$(readlink -f "$latest")" ]; then
mtime=$(stat -c %Y "$(readlink -f "$latest")")
size=$(du -sb "$(readlink -f "$latest")" | awk '{print $1}')
age=$(( $(date +%s) - mtime ))
echo "backup_last_success_timestamp{task=\"$task\"} $mtime" >> "$TMP"
echo "backup_size_bytes{task=\"$task\"} $size" >> "$TMP"
echo "backup_age_seconds{task=\"$task\"} $age" >> "$TMP"
else
echo "backup_last_success_timestamp{task=\"$task\"} 0" >> "$TMP"
echo "backup_size_bytes{task=\"$task\"} 0" >> "$TMP"
echo "backup_age_seconds{task=\"$task\"} 999999" >> "$TMP"
fi
done
mv "$TMP" "$METRICS"Prometheus alert rules (example):
groups:
- name: backup_alerts
rules:
- alert: BackupMissed
expr: backup_age_seconds > 93600 # 26 hours
for: 1h
labels:
severity: critical
annotations:
summary: "Backup {{ $labels.task }} has not run for over 26 hours"
- alert: BackupSizeAnomaly
expr: backup_size_bytes < backup_size_bytes offset 1d * 0.5
for: 1h
labels:
severity: warning
annotations:
summary: "Backup {{ $labels.task }} size dropped more than 50% compared to yesterday"Conclusion
rsync provides a reliable, low‑overhead foundation for backups. By mastering the delta algorithm, using --link-dest for incremental snapshots, applying safety nets ( -n, source checks, flock), controlling bandwidth, and adding verification and monitoring, you can build a production‑grade backup system that is space‑efficient, resilient, and easy to maintain.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
