Boost Recommendation Engine Performance with Off‑Heap Cache (OHC) in Java
This article explains the principles and practical implementation of the Java off‑heap cache framework OHC, detailing its architecture, memory allocation, serialization, configuration, and real‑world performance results within the MaFengWo recommendation engine, illustrating how it reduces latency and improves cache hit rates.
Part 1: Introduction to OHC
In recommendation systems, the engine performs recall and ranking stages that require massive data reads; fast data access is crucial for performance. Caching is widely used in enterprise web systems to reduce network latency by storing frequently accessed database results locally.
OHC (off‑heap cache) is a Java‑based key‑value cache library that runs in a single‑process, off‑heap mode. Originally developed for Apache Cassandra in 2015, it is now an independent library (https://github.com/snazy/ohc).
1. Heap vs. Off‑Heap
Java heap memory is managed by the JVM garbage collector (GC), which can pause application threads during collection. Heap‑based caches (e.g., HashMap) increase GC overhead when large. Off‑heap memory is allocated and freed by the application itself, avoiding GC impact and benefiting large caches (multi‑gigabyte scale).
2. OHC Features
Data stored off‑heap, does not affect GC.
Per‑entry expiration support.
Configurable eviction policies (LRU, W‑TinyLFU).
Can hold millions of entries.
Asynchronous loading.
Read/write latency in microseconds.
These characteristics make OHC suitable for the high‑throughput, low‑latency needs of a recommendation engine.
3. Usage Example
Typical steps to use OHC in a Java project:
Add OHC dependency to the Maven POM.
Implement org.caffinitas.ohc.CacheSerializer to serialize/deserialize objects.
Pass the serializer to the OHCache constructor.
Use get and put methods for cache operations.
A demo project is available at https://github.com/chebacca/ohc-example.
Part 2: OHC Implementation
1. Overall Architecture
OHC exposes the org.caffinitas.ohc.OHCache interface. Two implementations exist:
org.caffinitas.ohc.chunked.OHCacheChunkedImpl org.caffinitas.ohc.linked.OHCacheLinkedImplThe linked implementation stores each key‑value pair in a separate off‑heap block, suitable for medium‑to‑large entries, and is the one used in production.
2. OHCacheLinkedImpl Details
Key components:
Segment array: OffHeapLinkedMap[] Serializer/Deserializer: CacheSerializer Operations:
Compute key hash and locate the segment.
Retrieve the off‑heap pointer for the entry.
For get, read the byte array from off‑heap and deserialize.
For put, serialize the object to a byte array and write it to the allocated off‑heap memory.
3. Segment Implementation (OffHeapLinkedMap)
Each segment contains multiple buckets; each bucket is a linked list of off‑heap pointers. Lookup proceeds by hashing to a bucket, then linearly scanning the list.
Example layout (illustrated in the image below) shows two buckets with four key‑value pairs and their off‑heap addresses.
4. Space Allocation
OHC provides two allocators: JNANativeAllocator (uses Native.malloc) and UnsafeAllocator (uses Unsafe.allocateMemory). Each entry occupies:
off‑heap size = aligned key size + value size + 64 bytes metadataPart 3: OHC in MaFengWo Recommendation Engine
1. Engine Workflow
The engine performs recall, ranking, and re‑ranking, each requiring thousands of items and hundreds of features per item. Local caching of these features dramatically reduces network latency.
2. Data Types Stored in OHC
Offline features (e.g., daily click‑through rates) are updated hourly or daily and are ideal for OHC caching, avoiding repeated Redis reads. Real‑time features are kept in Redis with short TTLs to maintain freshness. Small hot data is cached in Guava (heap).
3. Serialization Choice
Keys are String; values are Object. Keys are serialized to UTF‑8 bytes; values use Kryo (wrapped in ThreadLocal because Kryo is not thread‑safe). Consistency between CacheSerializer#serializedSize and #serialize is essential; mismatched size estimates can waste off‑heap memory.
4. Production Configuration
Key tuning parameters:
Total capacity: grew from ~4 GB to ~10 GB to cover hot data.
Segment count: balanced to reduce lock contention while limiting heap metadata overhead.
Hash algorithm: CRC32C chosen for low CPU usage.
Eviction policy: LRU, given stable workload and low churn.
5. Online Performance
With a 10 GB off‑heap cache, the engine stores millions of entries, achieving >95 % hit rate. Average get latency is ~20 µs, put latency ~100 µs. Entry size limits are enforced via org.caffinitas.ohc.maxEntrySize to avoid oversized objects.
6. Optimizations in Practice
(1) Asynchronous expiration removal : expired entries are queued and cleaned by a background thread instead of blocking the read path.
(2) Lock refinement : switched from TAS (test‑and‑set) to TTAS (test‑test‑and‑set) locks, reducing contention and improving throughput.
Conclusion
The article presented OHC’s design, off‑heap memory management, and its successful deployment in MaFengWo’s recommendation engine. OHC offers low latency, GC‑independent caching with configurable eviction and expiration, making it well‑suited for storing large offline feature sets while preserving real‑time data freshness through complementary Redis and Guava caches.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mafengwo Technology
External communication platform of the Mafengwo Technology team, regularly sharing articles on advanced tech practices, tech exchange events, and recruitment.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
