Zero Data Loss with Redis RDB+AOF Hybrid Persistence: A Practical Guide
This comprehensive guide walks you through configuring Redis RDB+AOF hybrid persistence for zero data loss, covering prerequisites, environment matrix, checklist, step‑by‑step implementation, kernel tuning, monitoring, performance testing, security hardening, common issues, rollback procedures, and best practices.
Redis Persistence RDB+AOF Hybrid Mode Zero Data Loss Configuration Guide
Applicable Scenarios & Prerequisites
Applicable Business : Cache + persistence, session storage, message queue, real‑time leaderboard
OS/Kernel Requirements : Linux 3.10+ (RHEL/CentOS 7+ or Ubuntu 18.04+)
Redis Version : 4.0+ (hybrid persistence) / 6.0+ (recommended, IO multithreading)
Disk Requirements : SSD (IOPS > 5000) or high‑performance HDD (dataset < 10GB usable HDD)
Permission Requirements : redis user (or root)
Dependencies : redis‑server, redis‑cli, redis‑check‑aof, redis‑check‑rdb
Environment & Version Matrix
Component Version/Spec Description
OS RHEL 8.x / Ubuntu 20.04 LTS Kernel 4.18+ / 5.4+
Redis 6.2.x / 7.0.x Supports hybrid persistence (aof-use-rdb-preamble yes)
CPU 4 Core minimum Single‑thread performance priority (high frequency > multi‑core)
Memory 16 GB minimum Dataset + fork copy + OS cache
Disk SSD 500 IOPS+ RDB/AOF write + Rewrite overhead
Network 1 Gbps Master‑slave replication bandwidthQuick Checklist
Backup current Redis configuration and data files
Check current persistence mode and directory permissions
Configure RDB snapshot strategy (save parameters)
Enable AOF and hybrid persistence mode
Adjust AOF rewrite trigger conditions and fsync strategy
Configure kernel parameters (transparent huge pages, OOM)
Test RDB/AOF file integrity
Validate data recovery process
Configure monitoring and alerts (RDB/AOF latency, file size)
Prepare rollback plan (retain old config and data snapshots)
Implementation Steps (Core Content)
Step 1: Backup Existing Configuration and Data Files
RHEL/CentOS:
systemctl stop redis
cp /etc/redis/redis.conf /etc/redis/redis.conf.bak.$(date +%Y%m%d%H%M)
cp -r /var/lib/redis /var/lib/redis.bak.$(date +%Y%m%d%H%M)Ubuntu/Debian:
systemctl stop redis-server
cp /etc/redis/redis.conf /etc/redis/redis.conf.bak.$(date +%Y%m%d%H%M)
cp -r /var/lib/redis /var/lib/redis.bak.$(date +%Y%m%d%H%M)Check current persistence configuration:
grep -E '^save |^appendonly |^aof-use-rdb-preamble' /etc/redis/redis.confExpected default output:
save 900 1
save 300 10
save 60 10000
appendonly no
# aof-use-rdb-preamble yes (may be commented)Key Parameter Explanation: save 900 1: Trigger RDB snapshot if at least one write occurs within 900 seconds appendonly no: AOF disabled by default
Hybrid mode requires manual enable:
aof-use-rdb-preamble yesStep 2: Configure RDB Snapshot Strategy (Safety + Performance Balance)
Edit Redis configuration file: vi /etc/redis/redis.conf RDB core settings:
# RDB snapshot trigger conditions (recommended conservative strategy)
save 900 1 # at least 1 write in 15 minutes
save 300 10 # at least 10 writes in 5 minutes
save 60 10000 # at least 10000 writes in 1 minute
# RDB file configuration
dbfilename dump.rdb
dir /var/lib/redis
# RDB compression (saves disk, adds CPU overhead)
rdbcompression yes
# RDB checksum (detect file corruption, minor performance impact)
rdbchecksum yes
# Stop writes on BGSAVE error (data safety priority)
stop-writes-on-bgsave-error yes
# Incremental fsync for fork child process (Redis 7.0+ reduces fork blocking)
rdb-save-incremental-fsync yesPre‑execution checks:
ls -ld /var/lib/redis
stat /var/lib/redis/dump.rdbExpected output: owner redis, permissions 755/644
Step 3: Enable AOF and Hybrid Persistence (Core)
Edit Redis configuration file: vi /etc/redis/redis.conf AOF core settings:
# Enable AOF persistence
appendonly yes
# AOF filename
appendfilename "appendonly.aof"
# AOF fsync strategy (key)
# always: fsync on every write (most safe, lowest performance)
# everysec: fsync every second (recommended balance)
# no: OS decides (highest performance, possible data loss)
appendfsync everysec
# Disable fsync during rewrite (avoid disk I/O contention)
no-appendfsync-on-rewrite no
# AOF rewrite trigger conditions
auto-aof-rewrite-percentage 100 # trigger when AOF grows 100%
auto-aof-rewrite-min-size 64mb # minimum size 64MB to trigger
# Hybrid persistence mode (Redis 4.0+, core config)
aof-use-rdb-preamble yes
# AOF load truncated files (continue loading after truncation)
aof-load-truncated yes
# AOF rewrite incremental fsync (increase buffer size for high concurrency)
aof-rewrite-incremental-fsync yesKey parameter explanation: appendfsync everysec: compromise, at most 1 second of data loss aof-use-rdb-preamble yes: writes RDB snapshot first, then incremental AOF (reduces file size 60‑80%) no-appendfsync-on-rewrite no: continue fsync during rewrite for data safety
Hybrid persistence workflow:
Traditional AOF: [cmd1][cmd2]...[cmdN] → large file, slow recovery
Hybrid mode: [RDB snapshot]<binary>[incremental AOF]<text> → file 70%+ smaller, recovery 5‑10× fasterPost‑execution verification:
# Start Redis
systemctl start redis
# Verify AOF file generated
ls -lh /var/lib/redis/appendonly.aof
# Check persistence status
redis-cli INFO persistenceExpected output includes aof_use_rdb_preamble:1 indicating hybrid mode enabled.
Step 4: System Kernel Parameter Optimization (Critical Performance & Stability)
Edit sysctl configuration: vi /etc/sysctl.d/99-redis-tuning.conf Configuration content:
# Disable Transparent Huge Pages (THP) – strongly recommended by Redis
vm.nr_hugepages = 0
# Allow memory overcommit for fork processes
vm.overcommit_memory = 1
# TCP connection queue
net.core.somaxconn = 65535
# File descriptor limit
fs.file-max = 65535Apply configuration:
sysctl -p /etc/sysctl.d/99-redis-tuning.conf
sysctl vm.overcommit_memory
sysctl net.core.somaxconnDisable Transparent Huge Pages (must be executed):
# Temporary disable
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
# Permanent disable (RHEL/CentOS)
cat >> /etc/rc.local <<'EOF'
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
EOF
chmod +x /etc/rc.d/rc.local
# Ubuntu/Debian (systemd service)
cat > /etc/systemd/system/disable-thp.service <<'EOF'
[Unit]
Description=Disable Transparent Huge Pages (THP)
[Service]
Type=oneshot
ExecStart=/bin/sh -c "echo never > /sys/kernel/mm/transparent_hugepage/enabled && echo never > /sys/kernel/mm/transparent_hugepage/defrag"
[Install]
WantedBy=multi-user.target
EOF
systemctl enable disable-thp.service
systemctl start disable-thp.serviceVerify THP status:
cat /sys/kernel/mm/transparent_hugepage/enabled
# Expected output: always madvise [never]Step 5: Configure Redis System Limits (File Descriptors)
Edit limits configuration: vi /etc/security/limits.conf Add the following lines:
redis soft nofile 65535
redis hard nofile 65535
redis soft nproc 65535
redis hard nproc 65535Systemd service configuration (recommended):
mkdir -p /etc/systemd/system/redis.service.d
vi /etc/systemd/system/redis.service.d/limits.confContent:
[Service]
LimitNOFILE=65535
LimitNPROC=65535Reload and restart:
systemctl daemon-reload
systemctl restart redisVerify limits:
cat /proc/$(pidof redis-server)/limits | grep "open files"
# Expected: Max open files 65535 65535 filesStep 6: Test RDB/AOF File Integrity and Recovery
Manually trigger RDB snapshot:
redis-cli BGSAVE
redis-cli INFO persistence | grep rdb_bgsave_in_progressCheck RDB file integrity: redis-check-rdb /var/lib/redis/dump.rdb Manually trigger AOF rewrite:
redis-cli BGREWRITEAOF
redis-cli INFO persistence | grep aof_rewrite_in_progressCheck AOF file integrity: redis-check-aof /var/lib/redis/appendonly.aof Simulate failure recovery:
# Write test data
redis-cli SET test_key "recovery_test_$(date +%s)"
redis-cli SAVE
# Stop Redis
systemctl stop redis
# Simulate data loss (test only)
mv /var/lib/redis/dump.rdb /tmp/
# Start Redis (AOF recovery)
systemctl start redis
# Verify data restored
redis-cli GET test_keyVerify mixed file format:
xxd /var/lib/redis/appendonly.aof | head
# Expected output contains "REDIS0009" signature indicating hybrid modeStep 7: Configure Persistence Monitoring (Prometheus + Redis Exporter)
Install Redis Exporter:
# Download latest version
wget https://github.com/oliver006/redis_exporter/releases/download/v1.45.0/redis_exporter-v1.45.0.linux-amd64.tar.gz
tar -xzf redis_exporter-v1.45.0.linux-amd64.tar.gz -C /usr/local/bin/ --strip-components=1
# Create systemd service
cat > /etc/systemd/system/redis_exporter.service <<'EOF'
[Unit]
Description=Redis Exporter
After=network.target
[Service]
Type=simple
User=redis
ExecStart=/usr/local/bin/redis_exporter --redis.addr=localhost:6379
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
systemctl enable redis_exporter
systemctl start redis_exporterPrometheus scrape configuration:
scrape_configs:
- job_name: 'redis'
static_configs:
- targets: ['localhost:9121']Key PromQL queries:
# Seconds since last successful RDB save
time() - redis_rdb_last_save_timestamp_seconds
# Current AOF size (bytes)
redis_aof_current_size_bytes
# AOF rewrite in progress
redis_aof_rewrite_in_progress
# AOF last write status (0=fail, 1=success)
redis_aof_last_write_status
# RDB BGSAVE in progress
redis_rdb_bgsave_in_progress
# Memory usage percentage
redis_memory_used_bytes / redis_memory_max_bytes * 100Monitoring & Alerts (Immediately Usable)
Linux Native Monitoring Commands
Real‑time monitor RDB/AOF file changes: watch -n5 'ls -lh /var/lib/redis/*.{rdb,aof}' Monitor Redis persistence status:
redis-cli INFO persistence | grep -E "rdb_last_save_time|aof_last_rewrite_time|aof_current_size"Disk I/O monitoring (iostat):
iostat -x 5 | grep -E "Device|sda"
# Key metrics: %util > 80% (bottleneck), await > 20ms (high latency)Check Redis logs for persistence errors:
tail -f /var/log/redis/redis-server.log | grep -E "BGSAVE|AOF|rewrite"Expected healthy output example:
[12345] 30 Oct 10:00:15.123 * Background saving started by pid 12346
[12346] 30 Oct 10:00:17.456 * DB saved on disk
[12345] 30 Oct 10:00:17.457 * Background saving terminated with successPerformance & Capacity (Reproducible)
Benchmark Commands
Write performance test (impact of RDB+AOF):
# Pure memory mode (persistence disabled)
redis-cli CONFIG SET appendonly no
redis-cli CONFIG SET save ""
redis-benchmark -t set -n 1000000 -q
# Hybrid persistence mode
redis-cli CONFIG SET appendonly yes
redis-cli CONFIG SET aof-use-rdb-preamble yes
redis-benchmark -t set -n 1000000 -qExpected comparison:
Pure memory mode: SET: 120000.00 requests per second
Hybrid persistence: SET: 95000.00 requests per second
Performance loss: ~20% (acceptable range)Persistence Performance Overhead Matrix
Persistence Strategy Write QPS Data Safety Recovery Speed Disk Space
---------------------------------------------------------------------------------------------------------------------------------
No persistence 120k/s All data lost on restart N/A 0
RDB only 110k/s Lose data after last snapshot Seconds Small
AOF (everysec) 100k/s Lose up to 1 second of data Minutes Large (3‑5×)
RDB+AOF hybrid 95k/s Lose up to 1 second of data Seconds Medium (1.5‑2×)Tuning Parameter Matrix (by Dataset Size)
Dataset Size RDB save strategy AOF rewrite min appendfsync Disk Requirement
< 1GB save 300 10 32mb everysec HDD acceptable
1‑10GB save 600 100 64mb everysec SSD recommended
10‑50GB save 900 1000 256mb everysec SSD required
> 50GB save "" + cron 1gb no or everysec NVMe SSDSecurity & Compliance (Minimum Required)
File Permission Hardening
chown -R redis:redis /var/lib/redis
chmod 750 /var/lib/redis
chmod 640 /var/lib/redis/*.rdb
chmod 640 /var/lib/redis/*.aofRDB/AOF File Encryption (Redis 6.0+)
Configure TLS for transport encryption (master‑slave replication scenario):
# redis.conf additions
tls-port 6380
tls-cert-file /etc/redis/tls/redis.crt
tls-key-file /etc/redis/tls/redis.key
tls-ca-cert-file /etc/redis/tls/ca.crtDisk encryption (LUKS) example (initial setup only):
# Initialize encrypted volume
cryptsetup luksFormat /dev/sdb
cryptsetup open /dev/sdb redis_data
mkfs.xfs /dev/mapper/redis_data
mount /dev/mapper/redis_data /var/lib/redisAudit Logging
# Enable slow query log
slowlog-log-slower-than 10000 # 10ms
slowlog-max-len 128Query slow log:
redis-cli SLOWLOG GET 10Common Issues & Troubleshooting
Symptom
Diagnostic Command
Possible Root Cause
Quick Fix
Permanent Fix
RDB save failure
redis-cli INFO persistence
Insufficient disk space / permission error
df -h; chown redis /var/lib/redis
Expand disk; set eviction policy
AOF file corruption
redis-check-aof --fix appendonly.aof
Power loss / disk failure
Run fix command
Use UPS; RAID
High fork latency
redis-cli INFO stats | grep fork
THP not disabled / low memory
Disable THP; free memory
Permanent THP disable; add RAM
AOF rewrite blocking
iostat -x 1
Disk I/O bottleneck
Increase auto-aof-rewrite-min-size
Upgrade to SSD; set no-appendfsync-on-rewrite yes
Redis fails to start
journalctl -u redis -n 50
Corrupted RDB/AOF
Run redis-check-aof/rdb, fix
Regular backups
Out‑of‑Memory (OOM)
dmesg | grep -i kill
vm.overcommit_memory not set
sysctl -w vm.overcommit_memory=1
Add setting to /etc/sysctl.conf
Change & Rollback Playbook
Maintenance Window Recommendations
Low‑traffic period (02:00‑04:00)
Master‑slave architecture: upgrade slaves first, then master
Standalone instance: notify business owners in advance
Canary Strategy (Master‑Slave Replication)
Stage 1 – Slave verification (1 hour):
redis-cli -h slave1 CONFIG SET appendonly yes
redis-cli -h slave1 CONFIG SET aof-use-rdb-preamble yes
redis-cli -h slave1 INFO persistenceStage 2 – Master switch (10 minutes):
redis-cli -h master REPLICAOF slave1 6379
redis-cli -h slave1 REPLICAOF NO ONE
redis-cli -h new_master CONFIG REWRITEHealth Check Script
#!/bin/bash
REDIS_CLI="redis-cli"
# Check process
if ! pgrep -x redis-server > /dev/null; then
echo "ERROR: Redis not running"
exit 1
fi
# Ping
if ! $REDIS_CLI PING | grep -q PONG; then
echo "ERROR: Redis not responding"
exit 1
fi
# Persistence status
RDB_STATUS=$($REDIS_CLI INFO persistence | grep rdb_last_bgsave_status | cut -d: -f2 | tr -d '\r')
AOF_STATUS=$($REDIS_CLI INFO persistence | grep aof_last_write_status | cut -d: -f2 | tr -d '\r')
if [ "$RDB_STATUS" != "ok" ]; then
echo "ERROR: RDB save failed"
exit 1
fi
if [ "$AOF_STATUS" != "ok" ]; then
echo "ERROR: AOF write failed"
exit 1
fi
echo "Health check passed"Rollback Commands (within 3 minutes)
# Stop Redis
systemctl stop redis
# Restore configuration
cp /etc/redis/redis.conf.bak.YYYYMMDDHHMM /etc/redis/redis.conf
# Restore data files
rm -f /var/lib/redis/dump.rdb /var/lib/redis/appendonly.aof
cp /var/lib/redis.bak.YYYYMMDDHHMM/* /var/lib/redis/
# Start Redis
systemctl start redis
# Verify data restored
redis-cli DBSIZE
redis-cli GET test_keyBest Practices (10 Key Points)
Hybrid mode first : enable aof-use-rdb-preamble yes (reduces size 60‑80%)
fsync strategy : appendfsync everysec (use always for finance)
Disable Transparent Huge Pages : echo never > /sys/kernel/mm/transparent_hugepage/enabled Disk choice : dataset >10GB → SSD (IOPS >5000)
Regular validation : daily redis-check-rdb and redis-check-aof Monitor fork latency : INFO stats → latest_fork_usec < 1000 ms
AOF rewrite threshold : set auto-aof-rewrite-min-size ≥ 50% of memory usage
RDB compression : enable rdbcompression yes when CPU is sufficient
Backup strategy : daily RDB backup to remote storage, retain 7 days
Master‑slave replication : at least one replica with RDB+AOF double‑insurance
Appendix (Sample Assets)
Full redis.conf Persistence Sample
# ========== Persistence Configuration (Production) ==========
# RDB snapshot strategy
save 900 1
save 300 10
save 60 10000
# RDB file configuration
dbfilename dump.rdb
dir /var/lib/redis
rdbcompression yes
rdbchecksum yes
stop-writes-on-bgsave-error yes
rdb-save-incremental-fsync yes
# AOF persistence
appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
aof-rewrite-incremental-fsync yes
# Memory management
maxmemory 8gb
maxmemory-policy allkeys-lru
# Logging
loglevel notice
logfile /var/log/redis/redis-server.log
# Network
bind 127.0.0.1 192.168.1.100
port 6379
tcp-backlog 511
timeout 300
tcp-keepalive 300
# Slow query log
slowlog-log-slower-than 10000
slowlog-max-len 128
# Client limits
maxclients 10000Scheduled RDB Backup Script (Cron)
#!/bin/bash
BACKUP_DIR="/backup/redis"
REDIS_DATA_DIR="/var/lib/redis"
RETENTION_DAYS=7
DATE=$(date +%Y%m%d%H%M)
mkdir -p $BACKUP_DIR
# Trigger RDB snapshot
redis-cli BGSAVE
sleep 10
# Wait for BGSAVE to finish
while [ $(redis-cli INFO persistence | grep rdb_bgsave_in_progress | cut -d: -f2 | tr -d '\r') -eq 1 ]; do
sleep 5
done
# Copy RDB file
cp $REDIS_DATA_DIR/dump.rdb $BACKUP_DIR/dump_$DATE.rdb
# Compress backup
gzip $BACKUP_DIR/dump_$DATE.rdb
# Delete old backups
find $BACKUP_DIR -name "dump_*.rdb.gz" -mtime +$RETENTION_DAYS -delete
echo "Backup completed: $BACKUP_DIR/dump_$DATE.rdb.gz"Ansible Automation Task for Redis Persistence
---
- name: Configure Redis Persistence
hosts: redis_servers
become: yes
vars:
redis_conf: /etc/redis/redis.conf
redis_data_dir: /var/lib/redis
tasks:
- name: Backup current Redis config
copy:
src: "{{ redis_conf }}"
dest: "{{ redis_conf }}.bak.{{ ansible_date_time.epoch }}"
remote_src: yes
- name: Disable Transparent Huge Pages
lineinfile:
path: /etc/rc.local
line: "{{ item }}"
create: yes
loop:
- "echo never > /sys/kernel/mm/transparent_hugepage/enabled"
- "echo never > /sys/kernel/mm/transparent_hugepage/defrag"
notify: disable thp
- name: Configure sysctl for Redis
sysctl:
name: "{{ item.name }}"
value: "{{ item.value }}"
state: present
reload: yes
loop:
- { name: 'vm.overcommit_memory', value: '1' }
- { name: 'net.core.somaxconn', value: '65535' }
- name: Configure Redis persistence settings
lineinfile:
path: "{{ redis_conf }}"
regexp: "{{ item.regexp }}"
line: "{{ item.line }}"
loop:
- { regexp: '^appendonly ', line: 'appendonly yes' }
- { regexp: '^appendfsync ', line: 'appendfsync everysec' }
- { regexp: '^aof-use-rdb-preamble ', line: 'aof-use-rdb-preamble yes' }
- { regexp: '^save 900 ', line: 'save 900 1' }
notify: restart redis
- name: Ensure Redis data directory permissions
file:
path: "{{ redis_data_dir }}"
state: directory
owner: redis
group: redis
mode: '0750'
handlers:
- name: disable thp
shell: |
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
- name: restart redis
systemd:
name: redis
state: restartedTested on 2025‑10 with Redis 6.2.14 / 7.0.15 on RHEL 8.9 / Ubuntu 20.04.6
Ops Community
A leading IT operations community where professionals share and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
