Designing an Off‑Heap Disaster Recovery Cache to Keep Recommendations Fast
When the recommendation service of the Mafengwo app experiences database disconnections, third‑party timeouts, or network jitter, a locally‑deployed off‑heap cache built with OHC and SpringBoot can return pre‑computed results, isolating business logic, reducing latency, and improving user experience during failures.
Background
The Mafengwo recommendation system must respond within an average of 10 ms and keep the 99th‑percentile latency under 1 s. Sudden database disconnections, third‑party timeouts, or network jitter cause the service to miss these targets, resulting in empty results for users.
Design and Implementation
A disaster‑recovery cache was added as a local off‑heap store using the Open‑Source OHC library and integrated into the existing Spring Boot application. The cache is isolated from business logic, writes are performed asynchronously, and reads are served directly from memory.
Key Technical Choices
Separation of concerns : A CacheService is placed at the end of the request flow, exposing read(key) and write(key, value) APIs while leaving the original recommendation pipeline untouched.
Asynchronous writes : Writes are delegated to a ThreadPoolExecutor with a LinkedBlockingQueue. Because the QPS is below 100, the pool size is fixed at 1 thread and a DiscardPolicy discards tasks when the queue is full, ensuring the request thread is never blocked.
Off‑heap cache : Recommendation results do not require strong consistency, so an off‑heap cache avoids GC pauses and provides high‑throughput reads. OHC (originally part of Apache Cassandra) offers low‑overhead memory management.
File‑system backup : Cached entries are periodically persisted to disk using Spring Boot scheduled tasks. On application startup, an ApplicationRunner reloads the backup file into the off‑heap store, guaranteeing cache availability after restarts.
Overall Architecture
The existing recommendation flow remains unchanged. After the normal processing, a CacheModule decides whether to read from the cache (on exception) or submit a cache‑population task (on successful response). The CacheService maintains the off‑heap store and a task queue for asynchronous writes.
Module Details
CacheModule inspects the recommendation response. If no exception occurs and the response is non‑empty, it creates a cache task with a business‑scene key (e.g., "home video") and the payload, then submits it to CacheService. If an exception occurs, it reads the cached set for the same key and returns it.
CacheService uses OHC for off‑heap storage. Keys represent business scenes; values are sets of screen‑level content. When the cache reaches its configured maximum size, new entries replace existing ones randomly (a placeholder eviction strategy). Capacity and maxEntrySize must be sized based on load‑testing results.
Online Performance
During a one‑hour window (18:00‑19:00) cache hits compensated for service timeouts, improving overall availability. Asynchronous writes kept latency in the millisecond range, while reads were measured in microseconds, adding negligible overhead.
Pitfalls Encountered
Serialization with Kryo failed for classes that do not implement Serializable (e.g., java.util.ArrayList$SubList). Registering custom serializers from the https://github.com/magro/kryo-serializers repository resolves the issue.
Improper configuration of OHC capacity and maxEntrySize caused write failures. These parameters should be sized based on pre‑deployment load testing.
Future Optimizations
Replace random overwrite when the cache is full with an LRU or oldest‑entry eviction strategy.
Introduce finer‑grained cache keys per destination ID to avoid key collisions.
Move more MySQL‑dependent configuration data to file‑based local caches.
References
Java Caching Benchmarks 2016 – Part 1: https://cruftex.net/2016/03/16/Java-Caching-Benchmarks-2016-Part-1.html
On Heap vs Off Heap Memory Usage: https://dzone.com/articles/heap-vs-heap-memory-usage
OHC – An off‑heap cache: https://github.com/snazy/ohc
Kryo‑serializers: https://github.com/magro/kryo-serializers
Spring Boot scheduling tasks guide: https://spring.io/guides/gs/scheduling-tasks/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
