Backend Development 23 min read

Cache Algorithm Optimization for SSD-Based Advertising Retrieval System

The paper presents SsdEngine, a hierarchical SSD‑based advertising retrieval cache that combines a thread‑local ART cache with a lock‑free central array, uses flying pointer‑swizzling to cut duplicate I/O, and evaluates eviction policies (LRU, DoubleClock, TinyLFU, ARC), showing LRU excels under Zipfian locality while TinyLFU improves small‑cache hit rates.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
Cache Algorithm Optimization for SSD-Based Advertising Retrieval System

This article discusses the implementation and evaluation of cache algorithms in an advertising retrieval system built on NVMe SSD storage. The authors address performance tail latency issues that impact KPIs and revenue by introducing a page cache solution.

The content covers four main areas: First, the business necessity of caching in data-intensive scenarios where target service performance is significantly worse than cache service, combined with access pattern locality. Second, the fundamental nature of cache algorithms as eviction order strategies, including detailed explanations of LRU (Least Recently Used), LFU (Least Frequently Used), Redis Approximate LRU, and the operating system's DoubleClock algorithm used in Linux PageCache.

Third, the solution architecture (SsdEngine) implements hierarchical caching with TLS (Thread-Local Storage) cache and central cache. The TLS layer uses Adaptive Radix Tree (ART) for memory efficiency, while the central cache uses lock-free Array with CAS operations. The system also implements Flying PointerSwizzling to reduce duplicate page fetches within a single PV, achieving 60% reduction in disk I/O QPS. The eviction mechanism uses a two-stage approach: initial FIFO queue (TLS) and protected queue with various algorithms (DoubleClock, 2Q, TinyLFU'). ARC (Adaptive Replacement Cache) is employed for self-tuning between different eviction strategies.

Fourth, comprehensive performance evaluation shows that under Zipfian distribution (strong locality), LRU significantly outperforms DoubleClock in both hit ratio and 99,999th percentile latency. Under uniform distribution (no locality), both algorithms perform similarly. TinyLFU provides additional improvement over LRU, especially when cache size is small relative to key space, though it requires longer warm-up time.

The article includes code implementations demonstrating get/put operations for LRU using std::list and std::unordered_map, LFU with frequency buckets, and Redis's approximate LRU using bit fields for tracking access time.

Performance Optimizationadvertising retrievalcache algorithmDoubleClockLRU evictionNVMe SSDSSD storageTinyLFU
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.