Databases 11 min read

Root Cause Analysis and Data Migration for Redis Memory Overuse

This article details a production incident where Redis memory usage surged due to improper Set serialization, explains the investigation of storage structures and capacity formulas, and outlines a step‑by‑step data migration and cleanup process to restore normal operation.

Full-Stack Internet Architecture
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Root Cause Analysis and Data Migration for Redis Memory Overuse

Recently, the DBA reported that an online Redis instance had exceeded its originally designed capacity, even after two expansions, and its memory usage continued to grow. The business team was asked to verify whether the growth was normal and, if not, to provide a solution.

Problem Symptoms

The monitoring data showed a sharp increase in both memory usage and key count starting on June 1st. Initial suspicion fell on malicious traffic inflating the key count, and a code vulnerability was indeed found and patched, but request QPS did not indicate massive abuse.

Further investigation revealed that the data was stored using a Set structure. Although each Set only contained a single element, the memory consumption grew to about 30 MB, far exceeding the expected 8.4 MB based on the formula

9w * 14(key length) * 1(element count) * 10(element length) = 8.4 MB

.

Two main doubts emerged:

Whether the actual memory occupied by stored data was miscalculated.

Whether the capacity estimation formula was flawed.

Investigation of Doubt One: Data Size Miscalculation

Redis implements Set collections using either an integer set or a hash table. The integer set is used only when all elements are integers and the number of elements does not exceed 512 (configurable via set-max-intset-entries).

According to the normal rule, the Set should have been stored as an integer set, but the memory usage key command reported a single key consuming 218 B, which was unexpected for a simple 10‑digit numeric value.

The cause turned out to be the serialization method: the value was being serialized into a hexadecimal string, forcing Redis to store the Set as a hash table instead of an integer set. After changing the serialization, a test insertion showed memory usage dropping to 72 B, roughly one‑third of the previous size.

Investigation of Doubt Two: Capacity Formula Issue

The original capacity formula ignored Redis's internal memory overhead for different storage representations, leading to a severe underestimation of actual memory consumption.

With the root causes identified, the team decided to modify the serialization logic, redesign the storage to a simple key‑value model, and perform a full data migration ("data cleaning").

Data Migration Process

Deploy dual‑write logic (write to both old and new stores).

Synchronize historical data from the old store to the new one.

Switch read operations to the new data source.

Monitor the online service for any anomalies.

Disable writes to the old store.

Delete the old resources.

Remove the old read/write code.

Choosing a New Storage Location

Option 1: Write old data to the old store and new data to a newly deployed store. This requires code changes and a new deployment for the DBA.

Option 2: Write both old and new data to the existing store, map old data to the new structure, and perform a full data migration within the same store. This avoids additional deployment but requires manual removal of old data after migration.

Both options are viable; the team selected the second approach.

Deploy Dual‑Write Logic

A switch (feature flag) is added to control read/write paths, allowing hot deployment without affecting the flag state during service restarts.

Synchronize Historical Data

After deployment, export the RDB file, parse all keys, map each old value to the new structure, and write them to the new store. The following Java utility can be adapted for this purpose:

public class fixData {
    public static void main(String[] args) {
        String fileName = "test.txt";
        int rate = 500;
        int size = 200;
        if (args != null) {
            fileName = args[0];
            rate = Integer.parseInt(args[1]);
            size = Integer.parseInt(args[2]);
        }
        RateLimiter rateLimiter = RateLimiter.create(rate);
        ThreadPoolExecutor executorService = new ThreadPoolExecutor(size, size, 60L, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>());
        executorService.prestartAllCoreThreads();
        try {
            FileReader fr = new FileReader(fileName);
            LineNumberReader br = new LineNumberReader(fr);
            String line;
            while ((line = br.readLine()) != null) {
                try {
                    rateLimiter.acquire();
                    executorService.submit(() -> {
                        // TODO: implement data processing logic
                    });
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        System.exit(0);
    }
}

Switch Read Operations to the New Store

After all historical data is migrated, turn off the read‑switch for the old store, keeping the write‑switch active to maintain dual writes until the new store is fully validated.

Monitor Online Service

Observe the system for any user complaints or data anomalies after the read switch.

Disable Writes to the Old Store

If no issues arise, stop all writes to the old store.

Delete the Old Store

Remove all old keys using the same data‑cleaning tool.

Retire Old Read/Write Logic

Finally, remove the legacy code handling the old store, completing the migration.

Conclusion

The article presents a complete real‑world incident investigation and data‑migration workflow, deepening the author’s understanding of Redis’s internal storage mechanisms. It also emphasizes the importance of treating every line of code with respect, as seemingly minor mistakes can lead to disastrous outcomes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Data MigrationredisserializationtroubleshootingMemorySet
Full-Stack Internet Architecture
Written by

Full-Stack Internet Architecture

Introducing full-stack Internet architecture technologies centered on Java

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.