Designing an Off‑Heap Disaster Recovery Cache to Keep Recommendations Fast

When the recommendation service of the Mafengwo app experiences database disconnections, third‑party timeouts, or network jitter, a locally‑deployed off‑heap cache built with OHC and SpringBoot can return pre‑computed results, isolating business logic, reducing latency, and improving user experience during failures.

dbaplus Community
dbaplus Community
dbaplus Community
Designing an Off‑Heap Disaster Recovery Cache to Keep Recommendations Fast

Background

The Mafengwo recommendation system must respond within an average of 10 ms and keep the 99th‑percentile latency under 1 s. Sudden database disconnections, third‑party timeouts, or network jitter cause the service to miss these targets, resulting in empty results for users.

Design and Implementation

A disaster‑recovery cache was added as a local off‑heap store using the Open‑Source OHC library and integrated into the existing Spring Boot application. The cache is isolated from business logic, writes are performed asynchronously, and reads are served directly from memory.

Key Technical Choices

Separation of concerns : A CacheService is placed at the end of the request flow, exposing read(key) and write(key, value) APIs while leaving the original recommendation pipeline untouched.

Asynchronous writes : Writes are delegated to a ThreadPoolExecutor with a LinkedBlockingQueue. Because the QPS is below 100, the pool size is fixed at 1 thread and a DiscardPolicy discards tasks when the queue is full, ensuring the request thread is never blocked.

Off‑heap cache : Recommendation results do not require strong consistency, so an off‑heap cache avoids GC pauses and provides high‑throughput reads. OHC (originally part of Apache Cassandra) offers low‑overhead memory management.

File‑system backup : Cached entries are periodically persisted to disk using Spring Boot scheduled tasks. On application startup, an ApplicationRunner reloads the backup file into the off‑heap store, guaranteeing cache availability after restarts.

Overall Architecture

The existing recommendation flow remains unchanged. After the normal processing, a CacheModule decides whether to read from the cache (on exception) or submit a cache‑population task (on successful response). The CacheService maintains the off‑heap store and a task queue for asynchronous writes.

Module Details

CacheModule inspects the recommendation response. If no exception occurs and the response is non‑empty, it creates a cache task with a business‑scene key (e.g., "home video") and the payload, then submits it to CacheService. If an exception occurs, it reads the cached set for the same key and returns it.

CacheService uses OHC for off‑heap storage. Keys represent business scenes; values are sets of screen‑level content. When the cache reaches its configured maximum size, new entries replace existing ones randomly (a placeholder eviction strategy). Capacity and maxEntrySize must be sized based on load‑testing results.

Online Performance

During a one‑hour window (18:00‑19:00) cache hits compensated for service timeouts, improving overall availability. Asynchronous writes kept latency in the millisecond range, while reads were measured in microseconds, adding negligible overhead.

Pitfalls Encountered

Serialization with Kryo failed for classes that do not implement Serializable (e.g., java.util.ArrayList$SubList). Registering custom serializers from the https://github.com/magro/kryo-serializers repository resolves the issue.

Improper configuration of OHC capacity and maxEntrySize caused write failures. These parameters should be sized based on pre‑deployment load testing.

Future Optimizations

Replace random overwrite when the cache is full with an LRU or oldest‑entry eviction strategy.

Introduce finer‑grained cache keys per destination ID to avoid key collisions.

Move more MySQL‑dependent configuration data to file‑based local caches.

References

Java Caching Benchmarks 2016 – Part 1: https://cruftex.net/2016/03/16/Java-Caching-Benchmarks-2016-Part-1.html

On Heap vs Off Heap Memory Usage: https://dzone.com/articles/heap-vs-heap-memory-usage

OHC – An off‑heap cache: https://github.com/snazy/ohc

Kryo‑serializers: https://github.com/magro/kryo-serializers

Spring Boot scheduling tasks guide: https://spring.io/guides/gs/scheduling-tasks/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Javarecommendation systemcachingspringbootOff-Heapdisaster-recovery
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.