Databases 33 min read

Zero Data Loss with Redis RDB+AOF Hybrid Persistence: A Practical Guide

This comprehensive guide walks you through configuring Redis RDB+AOF hybrid persistence for zero data loss, covering prerequisites, environment matrix, checklist, step‑by‑step implementation, kernel tuning, monitoring, performance testing, security hardening, common issues, rollback procedures, and best practices.

Ops Community
Ops Community
Ops Community
Zero Data Loss with Redis RDB+AOF Hybrid Persistence: A Practical Guide

Redis Persistence RDB+AOF Hybrid Mode Zero Data Loss Configuration Guide

Applicable Scenarios & Prerequisites

Applicable Business : Cache + persistence, session storage, message queue, real‑time leaderboard

OS/Kernel Requirements : Linux 3.10+ (RHEL/CentOS 7+ or Ubuntu 18.04+)

Redis Version : 4.0+ (hybrid persistence) / 6.0+ (recommended, IO multithreading)

Disk Requirements : SSD (IOPS > 5000) or high‑performance HDD (dataset < 10GB usable HDD)

Permission Requirements : redis user (or root)

Dependencies : redis‑server, redis‑cli, redis‑check‑aof, redis‑check‑rdb

Environment & Version Matrix

Component   Version/Spec                Description
OS          RHEL 8.x / Ubuntu 20.04 LTS  Kernel 4.18+ / 5.4+
Redis       6.2.x / 7.0.x              Supports hybrid persistence (aof-use-rdb-preamble yes)
CPU         4 Core minimum             Single‑thread performance priority (high frequency > multi‑core)
Memory      16 GB minimum               Dataset + fork copy + OS cache
Disk        SSD 500 IOPS+               RDB/AOF write + Rewrite overhead
Network     1 Gbps                      Master‑slave replication bandwidth

Quick Checklist

Backup current Redis configuration and data files

Check current persistence mode and directory permissions

Configure RDB snapshot strategy (save parameters)

Enable AOF and hybrid persistence mode

Adjust AOF rewrite trigger conditions and fsync strategy

Configure kernel parameters (transparent huge pages, OOM)

Test RDB/AOF file integrity

Validate data recovery process

Configure monitoring and alerts (RDB/AOF latency, file size)

Prepare rollback plan (retain old config and data snapshots)

Implementation Steps (Core Content)

Step 1: Backup Existing Configuration and Data Files

RHEL/CentOS:

systemctl stop redis
cp /etc/redis/redis.conf /etc/redis/redis.conf.bak.$(date +%Y%m%d%H%M)
cp -r /var/lib/redis /var/lib/redis.bak.$(date +%Y%m%d%H%M)

Ubuntu/Debian:

systemctl stop redis-server
cp /etc/redis/redis.conf /etc/redis/redis.conf.bak.$(date +%Y%m%d%H%M)
cp -r /var/lib/redis /var/lib/redis.bak.$(date +%Y%m%d%H%M)

Check current persistence configuration:

grep -E '^save |^appendonly |^aof-use-rdb-preamble' /etc/redis/redis.conf

Expected default output:

save 900 1
save 300 10
save 60 10000
appendonly no
# aof-use-rdb-preamble yes (may be commented)

Key Parameter Explanation: save 900 1: Trigger RDB snapshot if at least one write occurs within 900 seconds appendonly no: AOF disabled by default

Hybrid mode requires manual enable:

aof-use-rdb-preamble yes

Step 2: Configure RDB Snapshot Strategy (Safety + Performance Balance)

Edit Redis configuration file: vi /etc/redis/redis.conf RDB core settings:

# RDB snapshot trigger conditions (recommended conservative strategy)
save 900 1      # at least 1 write in 15 minutes
save 300 10     # at least 10 writes in 5 minutes
save 60 10000   # at least 10000 writes in 1 minute

# RDB file configuration
dbfilename dump.rdb
dir /var/lib/redis

# RDB compression (saves disk, adds CPU overhead)
rdbcompression yes

# RDB checksum (detect file corruption, minor performance impact)
rdbchecksum yes

# Stop writes on BGSAVE error (data safety priority)
stop-writes-on-bgsave-error yes

# Incremental fsync for fork child process (Redis 7.0+ reduces fork blocking)
rdb-save-incremental-fsync yes

Pre‑execution checks:

ls -ld /var/lib/redis
stat /var/lib/redis/dump.rdb

Expected output: owner redis, permissions 755/644

Step 3: Enable AOF and Hybrid Persistence (Core)

Edit Redis configuration file: vi /etc/redis/redis.conf AOF core settings:

# Enable AOF persistence
appendonly yes

# AOF filename
appendfilename "appendonly.aof"

# AOF fsync strategy (key)
# always: fsync on every write (most safe, lowest performance)
# everysec: fsync every second (recommended balance)
# no: OS decides (highest performance, possible data loss)
appendfsync everysec

# Disable fsync during rewrite (avoid disk I/O contention)
no-appendfsync-on-rewrite no

# AOF rewrite trigger conditions
auto-aof-rewrite-percentage 100   # trigger when AOF grows 100%
auto-aof-rewrite-min-size 64mb    # minimum size 64MB to trigger

# Hybrid persistence mode (Redis 4.0+, core config)
aof-use-rdb-preamble yes

# AOF load truncated files (continue loading after truncation)
aof-load-truncated yes

# AOF rewrite incremental fsync (increase buffer size for high concurrency)
aof-rewrite-incremental-fsync yes

Key parameter explanation: appendfsync everysec: compromise, at most 1 second of data loss aof-use-rdb-preamble yes: writes RDB snapshot first, then incremental AOF (reduces file size 60‑80%) no-appendfsync-on-rewrite no: continue fsync during rewrite for data safety

Hybrid persistence workflow:

Traditional AOF: [cmd1][cmd2]...[cmdN] → large file, slow recovery
Hybrid mode: [RDB snapshot]<binary>[incremental AOF]<text> → file 70%+ smaller, recovery 5‑10× faster

Post‑execution verification:

# Start Redis
systemctl start redis

# Verify AOF file generated
ls -lh /var/lib/redis/appendonly.aof

# Check persistence status
redis-cli INFO persistence

Expected output includes aof_use_rdb_preamble:1 indicating hybrid mode enabled.

Step 4: System Kernel Parameter Optimization (Critical Performance & Stability)

Edit sysctl configuration: vi /etc/sysctl.d/99-redis-tuning.conf Configuration content:

# Disable Transparent Huge Pages (THP) – strongly recommended by Redis
vm.nr_hugepages = 0

# Allow memory overcommit for fork processes
vm.overcommit_memory = 1

# TCP connection queue
net.core.somaxconn = 65535

# File descriptor limit
fs.file-max = 65535

Apply configuration:

sysctl -p /etc/sysctl.d/99-redis-tuning.conf
sysctl vm.overcommit_memory
sysctl net.core.somaxconn

Disable Transparent Huge Pages (must be executed):

# Temporary disable
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

# Permanent disable (RHEL/CentOS)
cat >> /etc/rc.local <<'EOF'
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
EOF
chmod +x /etc/rc.d/rc.local

# Ubuntu/Debian (systemd service)
cat > /etc/systemd/system/disable-thp.service <<'EOF'
[Unit]
Description=Disable Transparent Huge Pages (THP)

[Service]
Type=oneshot
ExecStart=/bin/sh -c "echo never > /sys/kernel/mm/transparent_hugepage/enabled && echo never > /sys/kernel/mm/transparent_hugepage/defrag"

[Install]
WantedBy=multi-user.target
EOF
systemctl enable disable-thp.service
systemctl start disable-thp.service

Verify THP status:

cat /sys/kernel/mm/transparent_hugepage/enabled
# Expected output: always madvise [never]

Step 5: Configure Redis System Limits (File Descriptors)

Edit limits configuration: vi /etc/security/limits.conf Add the following lines:

redis soft nofile 65535
redis hard nofile 65535
redis soft nproc 65535
redis hard nproc 65535

Systemd service configuration (recommended):

mkdir -p /etc/systemd/system/redis.service.d
vi /etc/systemd/system/redis.service.d/limits.conf

Content:

[Service]
LimitNOFILE=65535
LimitNPROC=65535

Reload and restart:

systemctl daemon-reload
systemctl restart redis

Verify limits:

cat /proc/$(pidof redis-server)/limits | grep "open files"
# Expected: Max open files 65535 65535 files

Step 6: Test RDB/AOF File Integrity and Recovery

Manually trigger RDB snapshot:

redis-cli BGSAVE
redis-cli INFO persistence | grep rdb_bgsave_in_progress

Check RDB file integrity: redis-check-rdb /var/lib/redis/dump.rdb Manually trigger AOF rewrite:

redis-cli BGREWRITEAOF
redis-cli INFO persistence | grep aof_rewrite_in_progress

Check AOF file integrity: redis-check-aof /var/lib/redis/appendonly.aof Simulate failure recovery:

# Write test data
redis-cli SET test_key "recovery_test_$(date +%s)"
redis-cli SAVE

# Stop Redis
systemctl stop redis

# Simulate data loss (test only)
mv /var/lib/redis/dump.rdb /tmp/

# Start Redis (AOF recovery)
systemctl start redis

# Verify data restored
redis-cli GET test_key

Verify mixed file format:

xxd /var/lib/redis/appendonly.aof | head
# Expected output contains "REDIS0009" signature indicating hybrid mode

Step 7: Configure Persistence Monitoring (Prometheus + Redis Exporter)

Install Redis Exporter:

# Download latest version
wget https://github.com/oliver006/redis_exporter/releases/download/v1.45.0/redis_exporter-v1.45.0.linux-amd64.tar.gz
tar -xzf redis_exporter-v1.45.0.linux-amd64.tar.gz -C /usr/local/bin/ --strip-components=1

# Create systemd service
cat > /etc/systemd/system/redis_exporter.service <<'EOF'
[Unit]
Description=Redis Exporter
After=network.target

[Service]
Type=simple
User=redis
ExecStart=/usr/local/bin/redis_exporter --redis.addr=localhost:6379
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF
systemctl enable redis_exporter
systemctl start redis_exporter

Prometheus scrape configuration:

scrape_configs:
- job_name: 'redis'
  static_configs:
  - targets: ['localhost:9121']

Key PromQL queries:

# Seconds since last successful RDB save
time() - redis_rdb_last_save_timestamp_seconds

# Current AOF size (bytes)
redis_aof_current_size_bytes

# AOF rewrite in progress
redis_aof_rewrite_in_progress

# AOF last write status (0=fail, 1=success)
redis_aof_last_write_status

# RDB BGSAVE in progress
redis_rdb_bgsave_in_progress

# Memory usage percentage
redis_memory_used_bytes / redis_memory_max_bytes * 100

Monitoring & Alerts (Immediately Usable)

Linux Native Monitoring Commands

Real‑time monitor RDB/AOF file changes: watch -n5 'ls -lh /var/lib/redis/*.{rdb,aof}' Monitor Redis persistence status:

redis-cli INFO persistence | grep -E "rdb_last_save_time|aof_last_rewrite_time|aof_current_size"

Disk I/O monitoring (iostat):

iostat -x 5 | grep -E "Device|sda"
# Key metrics: %util > 80% (bottleneck), await > 20ms (high latency)

Check Redis logs for persistence errors:

tail -f /var/log/redis/redis-server.log | grep -E "BGSAVE|AOF|rewrite"

Expected healthy output example:

[12345] 30 Oct 10:00:15.123 * Background saving started by pid 12346
[12346] 30 Oct 10:00:17.456 * DB saved on disk
[12345] 30 Oct 10:00:17.457 * Background saving terminated with success

Performance & Capacity (Reproducible)

Benchmark Commands

Write performance test (impact of RDB+AOF):

# Pure memory mode (persistence disabled)
redis-cli CONFIG SET appendonly no
redis-cli CONFIG SET save ""
redis-benchmark -t set -n 1000000 -q

# Hybrid persistence mode
redis-cli CONFIG SET appendonly yes
redis-cli CONFIG SET aof-use-rdb-preamble yes
redis-benchmark -t set -n 1000000 -q

Expected comparison:

Pure memory mode:    SET: 120000.00 requests per second
Hybrid persistence: SET: 95000.00 requests per second
Performance loss: ~20% (acceptable range)

Persistence Performance Overhead Matrix

Persistence Strategy   Write QPS   Data Safety                     Recovery Speed   Disk Space
---------------------------------------------------------------------------------------------------------------------------------
No persistence          120k/s    All data lost on restart        N/A              0
RDB only                110k/s    Lose data after last snapshot  Seconds          Small
AOF (everysec)          100k/s    Lose up to 1 second of data    Minutes          Large (3‑5×)
RDB+AOF hybrid          95k/s     Lose up to 1 second of data    Seconds          Medium (1.5‑2×)

Tuning Parameter Matrix (by Dataset Size)

Dataset Size   RDB save strategy   AOF rewrite min   appendfsync   Disk Requirement
< 1GB         save 300 10        32mb              everysec      HDD acceptable
1‑10GB        save 600 100       64mb              everysec      SSD recommended
10‑50GB       save 900 1000      256mb             everysec      SSD required
> 50GB        save "" + cron    1gb               no or everysec NVMe SSD

Security & Compliance (Minimum Required)

File Permission Hardening

chown -R redis:redis /var/lib/redis
chmod 750 /var/lib/redis
chmod 640 /var/lib/redis/*.rdb
chmod 640 /var/lib/redis/*.aof

RDB/AOF File Encryption (Redis 6.0+)

Configure TLS for transport encryption (master‑slave replication scenario):

# redis.conf additions
tls-port 6380
tls-cert-file /etc/redis/tls/redis.crt
tls-key-file /etc/redis/tls/redis.key
tls-ca-cert-file /etc/redis/tls/ca.crt

Disk encryption (LUKS) example (initial setup only):

# Initialize encrypted volume
cryptsetup luksFormat /dev/sdb
cryptsetup open /dev/sdb redis_data
mkfs.xfs /dev/mapper/redis_data
mount /dev/mapper/redis_data /var/lib/redis

Audit Logging

# Enable slow query log
slowlog-log-slower-than 10000   # 10ms
slowlog-max-len 128

Query slow log:

redis-cli SLOWLOG GET 10

Common Issues & Troubleshooting

Symptom

Diagnostic Command

Possible Root Cause

Quick Fix

Permanent Fix

RDB save failure

redis-cli INFO persistence

Insufficient disk space / permission error

df -h; chown redis /var/lib/redis

Expand disk; set eviction policy

AOF file corruption

redis-check-aof --fix appendonly.aof

Power loss / disk failure

Run fix command

Use UPS; RAID

High fork latency

redis-cli INFO stats | grep fork

THP not disabled / low memory

Disable THP; free memory

Permanent THP disable; add RAM

AOF rewrite blocking

iostat -x 1

Disk I/O bottleneck

Increase auto-aof-rewrite-min-size

Upgrade to SSD; set no-appendfsync-on-rewrite yes

Redis fails to start

journalctl -u redis -n 50

Corrupted RDB/AOF

Run redis-check-aof/rdb, fix

Regular backups

Out‑of‑Memory (OOM)

dmesg | grep -i kill

vm.overcommit_memory not set

sysctl -w vm.overcommit_memory=1

Add setting to /etc/sysctl.conf

Change & Rollback Playbook

Maintenance Window Recommendations

Low‑traffic period (02:00‑04:00)

Master‑slave architecture: upgrade slaves first, then master

Standalone instance: notify business owners in advance

Canary Strategy (Master‑Slave Replication)

Stage 1 – Slave verification (1 hour):

redis-cli -h slave1 CONFIG SET appendonly yes
redis-cli -h slave1 CONFIG SET aof-use-rdb-preamble yes
redis-cli -h slave1 INFO persistence

Stage 2 – Master switch (10 minutes):

redis-cli -h master REPLICAOF slave1 6379
redis-cli -h slave1 REPLICAOF NO ONE
redis-cli -h new_master CONFIG REWRITE

Health Check Script

#!/bin/bash
REDIS_CLI="redis-cli"
# Check process
if ! pgrep -x redis-server > /dev/null; then
  echo "ERROR: Redis not running"
  exit 1
fi
# Ping
if ! $REDIS_CLI PING | grep -q PONG; then
  echo "ERROR: Redis not responding"
  exit 1
fi
# Persistence status
RDB_STATUS=$($REDIS_CLI INFO persistence | grep rdb_last_bgsave_status | cut -d: -f2 | tr -d '\r')
AOF_STATUS=$($REDIS_CLI INFO persistence | grep aof_last_write_status | cut -d: -f2 | tr -d '\r')
if [ "$RDB_STATUS" != "ok" ]; then
  echo "ERROR: RDB save failed"
  exit 1
fi
if [ "$AOF_STATUS" != "ok" ]; then
  echo "ERROR: AOF write failed"
  exit 1
fi
echo "Health check passed"

Rollback Commands (within 3 minutes)

# Stop Redis
systemctl stop redis

# Restore configuration
cp /etc/redis/redis.conf.bak.YYYYMMDDHHMM /etc/redis/redis.conf

# Restore data files
rm -f /var/lib/redis/dump.rdb /var/lib/redis/appendonly.aof
cp /var/lib/redis.bak.YYYYMMDDHHMM/* /var/lib/redis/

# Start Redis
systemctl start redis

# Verify data restored
redis-cli DBSIZE
redis-cli GET test_key

Best Practices (10 Key Points)

Hybrid mode first : enable aof-use-rdb-preamble yes (reduces size 60‑80%)

fsync strategy : appendfsync everysec (use always for finance)

Disable Transparent Huge Pages : echo never > /sys/kernel/mm/transparent_hugepage/enabled Disk choice : dataset >10GB → SSD (IOPS >5000)

Regular validation : daily redis-check-rdb and redis-check-aof Monitor fork latency : INFO statslatest_fork_usec < 1000 ms

AOF rewrite threshold : set auto-aof-rewrite-min-size ≥ 50% of memory usage

RDB compression : enable rdbcompression yes when CPU is sufficient

Backup strategy : daily RDB backup to remote storage, retain 7 days

Master‑slave replication : at least one replica with RDB+AOF double‑insurance

Appendix (Sample Assets)

Full redis.conf Persistence Sample

# ========== Persistence Configuration (Production) ==========

# RDB snapshot strategy
save 900 1
save 300 10
save 60 10000

# RDB file configuration
dbfilename dump.rdb
dir /var/lib/redis
rdbcompression yes
rdbchecksum yes
stop-writes-on-bgsave-error yes
rdb-save-incremental-fsync yes

# AOF persistence
appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
aof-rewrite-incremental-fsync yes

# Memory management
maxmemory 8gb
maxmemory-policy allkeys-lru

# Logging
loglevel notice
logfile /var/log/redis/redis-server.log

# Network
bind 127.0.0.1 192.168.1.100
port 6379
tcp-backlog 511
timeout 300
tcp-keepalive 300

# Slow query log
slowlog-log-slower-than 10000
slowlog-max-len 128

# Client limits
maxclients 10000

Scheduled RDB Backup Script (Cron)

#!/bin/bash
BACKUP_DIR="/backup/redis"
REDIS_DATA_DIR="/var/lib/redis"
RETENTION_DAYS=7
DATE=$(date +%Y%m%d%H%M)

mkdir -p $BACKUP_DIR
# Trigger RDB snapshot
redis-cli BGSAVE
sleep 10
# Wait for BGSAVE to finish
while [ $(redis-cli INFO persistence | grep rdb_bgsave_in_progress | cut -d: -f2 | tr -d '\r') -eq 1 ]; do
  sleep 5
done
# Copy RDB file
cp $REDIS_DATA_DIR/dump.rdb $BACKUP_DIR/dump_$DATE.rdb
# Compress backup
gzip $BACKUP_DIR/dump_$DATE.rdb
# Delete old backups
find $BACKUP_DIR -name "dump_*.rdb.gz" -mtime +$RETENTION_DAYS -delete

echo "Backup completed: $BACKUP_DIR/dump_$DATE.rdb.gz"

Ansible Automation Task for Redis Persistence

---
- name: Configure Redis Persistence
  hosts: redis_servers
  become: yes

  vars:
    redis_conf: /etc/redis/redis.conf
    redis_data_dir: /var/lib/redis

  tasks:
    - name: Backup current Redis config
      copy:
        src: "{{ redis_conf }}"
        dest: "{{ redis_conf }}.bak.{{ ansible_date_time.epoch }}"
        remote_src: yes

    - name: Disable Transparent Huge Pages
      lineinfile:
        path: /etc/rc.local
        line: "{{ item }}"
        create: yes
      loop:
        - "echo never > /sys/kernel/mm/transparent_hugepage/enabled"
        - "echo never > /sys/kernel/mm/transparent_hugepage/defrag"
      notify: disable thp

    - name: Configure sysctl for Redis
      sysctl:
        name: "{{ item.name }}"
        value: "{{ item.value }}"
        state: present
        reload: yes
      loop:
        - { name: 'vm.overcommit_memory', value: '1' }
        - { name: 'net.core.somaxconn', value: '65535' }

    - name: Configure Redis persistence settings
      lineinfile:
        path: "{{ redis_conf }}"
        regexp: "{{ item.regexp }}"
        line: "{{ item.line }}"
      loop:
        - { regexp: '^appendonly ', line: 'appendonly yes' }
        - { regexp: '^appendfsync ', line: 'appendfsync everysec' }
        - { regexp: '^aof-use-rdb-preamble ', line: 'aof-use-rdb-preamble yes' }
        - { regexp: '^save 900 ', line: 'save 900 1' }
      notify: restart redis

    - name: Ensure Redis data directory permissions
      file:
        path: "{{ redis_data_dir }}"
        state: directory
        owner: redis
        group: redis
        mode: '0750'

  handlers:
    - name: disable thp
      shell: |
        echo never > /sys/kernel/mm/transparent_hugepage/enabled
        echo never > /sys/kernel/mm/transparent_hugepage/defrag

    - name: restart redis
      systemd:
        name: redis
        state: restarted

Tested on 2025‑10 with Redis 6.2.14 / 7.0.15 on RHEL 8.9 / Ubuntu 20.04.6

LinuxPersistenceAOFRDBHybrid
Ops Community
Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.