Databases 8 min read

Why Did Our Redis Freeze? Uncovering AOF Risks and Recovery Strategies

A recent hardware failure left a physical server's disk read‑only, causing Redis to hang; this article explains the AOF mechanism, its potential pitfalls, log strategies, and practical steps to prevent and mitigate such issues in production environments.

Ziru Technology

Apr 14, 2022

Why Did Our Redis Freeze? Uncovering AOF Risks and Recovery Strategies

Cause

Recently a physical machine's hard disk went offline and became read‑only, which caused Redis to stall and the operating system to report errors.

Redis Application Error

io.lettuce.core.RedisCommandExecutionException: MISCONF Errors writing to the AOF file: Read-only file system

org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out

The file system being read‑only prevented AOF writes, and although the AOF policy was set to everysec , the main thread was still blocked because key reads were also stuck.

AOF Mechanism

AOF (Append‑Only File) is a write‑after log: Redis first executes commands in memory, then records each command as text in the AOF file, unlike traditional write‑ahead logs that store the modified data.

Traditional database redo logs record the changed data, while AOF records every command received by Redis.

AOF Log Content

Each log entry consists of parts prefixed by $ and a length, e.g., $3 set indicates a three‑byte command "set".

Avoid logging erroneous commands.

Does not block write operations.

Potential Risks of AOF

Data loss: If a crash occurs before the log is flushed to disk, the last command may be lost.

Main‑thread blocking: Although AOF avoids blocking the current command, the log is written by the main thread; heavy disk I/O can slow down subsequent operations.

Controlling when the AOF log is flushed mitigates these risks.

Log Strategies

1. Always – Synchronously write the log to disk after each command.

2. Everysec – Buffer log entries in memory and flush to disk every second.

3. No – Let the operating system decide when to flush.

Summary:

Choose No for highest performance.

Choose Always for maximum durability.

Choose Everysec for a balance, accepting minimal data loss.

Back to the Problem

Our everysec policy writes logs in a background thread, but because the file system was read‑only, the background thread hung, causing the main thread to wait indefinitely for fsync to complete, ultimately blocking all Redis operations.

How to Improve?

For disk failures, enhance Sentinel checks to verify writeability, not just ping.

To reduce I/O pressure:

Separate high‑IO applications from the Redis host.

Set no-appendfsync-on-rewrite to yes to skip fsync during AOF rewrite, accepting possible data loss.

Schedule backups and AOF writes per instance to spread I/O load.

These measures help prevent Redis from becoming unresponsive when the underlying storage encounters issues.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance database Redis Persistence AOF Filesystem

Written by

Ziru Technology

Ziru Official Tech Account

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.