Design and Implementation of a Cluster‑Aware Guava Cache Component for High Reliability
The paper presents a cluster‑aware Guava cache component for Alibaba’s Xianyu platform that mitigates downstream service failures by adding asynchronous reload, cluster‑wide key invalidation, and size reporting, enabling automatic fallback to refreshed local data and improving latency, with future plans for a management console, tiered storage, and disk‑backed caching.
Background
With the rapid evolution of Internet services, business logic has become increasingly fragmented. In Alibaba’s Xianyu platform, the backend now depends on many distributed services, making the stability of upstream services vulnerable to failures in downstream middle‑platform components such as the product‑center database or the recommendation vector cluster.
Industry Practices
When a service experiences a failure, a pragmatic approach is to pre‑populate the required data and return it as a fallback. For Xianyu’s product‑stream, the required payload is about 3 MB (≈5 pages). The author surveyed common solutions and selected local caching as a primary technique.
Cache Component Design
The author evaluated several Java caching libraries (Guava, Caffeine, Ehcache, Cache2K, ConcurrentHashMap, Varnish, JackRabbit) and chose Guava for its generality and easy integration with internal middleware. The cluster‑aware cache adds three capabilities:
Asynchronous reload of expired keys.
Cluster‑wide invalidation of a specific key.
Periodic reporting of local cache size per instance.
Implementation details include extending CacheLoader to provide async reload, a LocalCacheManager that aggregates all AbstractCacheConfig subclasses, and configuration beans that automatically gain cluster‑wide invalidation.
Typical Use‑Case: Automatic Fallback Component
The fallback component refreshes data in a Tair store via a scheduled job, invalidates the local cache, and serves subsequent requests from the refreshed local cache, dramatically improving latency and success rate during network partitions.
Outlook
Future work includes a web management console for cache configuration, tiered storage for large or rarely used keys, and disk‑backed caching to avoid memory pressure.
[1] Halodoc Caching Analysis: https://blogs.halodoc.io/in-process-cache-2Xianyu Technology
Official account of the Xianyu technology team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.