How to Proactively Refresh Memcached Before Expiration and Avoid DB Thundering Herd
This article explores why Memcached expiration can cause sudden DB overload and presents five practical strategies—including periodic DB refresh, lock‑based queries, dual‑key schemes, and time‑embedded values—to proactively update caches and keep backend performance stable.
Recently I noticed nginx's merge‑back‑origin feature, which is similar to the ideas discussed here, though nginx focuses on handling concurrent requests during cache invalidation rather than proactively updating caches before they expire.
When a Memcached entry expires, a sudden surge of DB queries can occur, dramatically increasing DB load. This blog examines how to refresh the cache before it expires, rather than merely preventing high‑concurrency DB queries after expiration.
1. Periodically query the DB and write back to Memcached
This approach struggles when cache keys are dynamic or when it is unclear which data should be cached, making it hard to distinguish hot and cold data.
2. On a cache miss, acquire a lock and let only one thread query the DB
This method is unreliable, especially in multi‑server environments where concurrent updates can still happen.
3. Store the expiration timestamp together with the value
When a get returns data, if current_time - expiration_time > 5s, launch a background task to query the DB and refresh the cache. The task must ensure only one thread updates a given key, otherwise the DB thundering herd problem persists. This requires serializing the timestamp with the value, which can be inconvenient.
4. Use two keys: one for data, one for expiration marker
For example, store data under aaa with a 30‑second TTL and an auxiliary key expire_aaa with a 25‑second TTL. When fetching, perform a multiget for both keys; if expire_aaa is null, start a background task that attempts an add expire_aaa with a short timeout (e.g., 3 seconds). If the add succeeds, query the DB, update aaa, and set expire_aaa to 25 seconds. The add command ensures only one process proceeds.
5. Embed the expiration time in the value and combine with the add command
Update (2014‑06‑29): The dual‑key approach consumes extra memory, so we combine it with method 3. Store a tuple (time, value) where time is the future expiration timestamp. On get, if time - now < 5 seconds, launch a new thread that attempts add __load_{key}. If the add succeeds, load fresh data and update the cache; otherwise, return the existing value immediately.
Implementation using xmemcached:
public interface DataLoader { public <T> T load(); }
public class RefreshCacheManager { static public <T> T tryGet(MemcachedClient memcachedClient, final String key, final int expire, final DataLoader dataLoader); static public <T> T autoRetryGet(MemcachedClient memcachedClient, final String key, final int expire, final DataLoader dataLoader); }
The autoRetryGet method retries up to four times with 500 ms intervals when a null is returned, automatically handling near‑expiration refreshes.
Conclusion
I prefer method 5 because it is simple, intuitive, saves memory compared to the dual‑key scheme, and avoids the need for mget in clustered Memcached deployments. It naturally adapts to hot and cold data: cold data expires after its TTL without access, while hot data remains continuously refreshed.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
