How a Redis Crash Slowed Our Shop to 20 s and the Multi‑Layer Caching Fix

A sudden Redis outage caused a shopping homepage to take 20 seconds to load, prompting a rapid analysis of data‑volume growth, a post‑mortem of the fallback logic, and the design of a combined local‑cache, MongoDB, and database strategy to restore fast, reliable service.

dbaplus Community
dbaplus Community
dbaplus Community
How a Redis Crash Slowed Our Shop to 20 s and the Multi‑Layer Caching Fix

Background

A new recommendation feature was released on an e‑commerce homepage. After deployment the page response time increased to ~20 seconds, causing user complaints.

Root Cause

The feature stored recommendation data in Redis keyed by region and category. The data volume grew by several hundred times compared with the original design, exceeding the 1 GB memory allocated on the Alibaba Cloud Redis instance. When Redis ran out of memory it crashed, and the application fell back to querying the relational database directly, which dramatically slowed response time.

Immediate Mitigation

The quickest remedy was to enlarge the Redis instance from 1 GB to 4 GB, which restored normal latency. This fix does not address the underlying design problems.

Post‑mortem Findings

**Inefficient data schema** – many fields were cached that were either null or never used by the front‑end, inflating memory consumption.

**Fallback risk** – the code unconditionally queried the relational database when Redis was unavailable, which could overload the DB under higher traffic.

Design Alternatives Evaluated

Static page generation – rejected because traffic is still low and the required front‑end/back‑end changes are extensive.

Local in‑process cache – adds a fast layer but may exhaust the application server’s memory if all recommendation data is cached locally.

Replace Redis with MongoDB – MongoDB stores data on disk and uses Linux mmap to keep hot documents in memory, making it suitable for large document sets.

Combine local cache with MongoDB – keep hot recommendation data in a local cache refreshed every 5 minutes; fall back to MongoDB for the remaining data.

Redis failure impact diagram
Redis failure impact diagram
Local cache + MongoDB design
Local cache + MongoDB design

Layered Fallback Strategy

Use Apollo configuration to provide a default recommendation set when MongoDB is unavailable.

If Apollo defaults fail, query the relational database directly.

As a last resort, read from Redis (which only holds hot items).

Introduce a secondary local cache that stores the default data for 24 hours after the first DB fetch.

Final Architecture

The production solution adopts a composite caching layer:

Local in‑process cache – holds hot recommendation data, refreshed every 5 minutes.

MongoDB – persistent store for the full recommendation dataset, leveraging disk‑backed storage with memory‑mapped hot‑spot caching.

Default local cache – stores a static fallback dataset (e.g., Beijing Dongcheng district recommendations) for 24 hours.

Relational database – ultimate source of truth when all caches miss.

Final multi‑layer caching architecture
Final multi‑layer caching architecture

This layered design provides fast response times, mitigates single‑point failures, and balances memory usage across the stack.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performancedatabaseredisMongoDB
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.