Backend Development 8 min read

Design and Implementation of a Cluster‑Aware Guava Cache Component for High Reliability

The paper presents a cluster‑aware Guava cache component for Alibaba’s Xianyu platform that mitigates downstream service failures by adding asynchronous reload, cluster‑wide key invalidation, and size reporting, enabling automatic fallback to refreshed local data and improving latency, with future plans for a management console, tiered storage, and disk‑backed caching.

Xianyu Technology
Xianyu Technology
Xianyu Technology
Design and Implementation of a Cluster‑Aware Guava Cache Component for High Reliability

Background

With the rapid evolution of Internet services, business logic has become increasingly fragmented. In Alibaba’s Xianyu platform, the backend now depends on many distributed services, making the stability of upstream services vulnerable to failures in downstream middle‑platform components such as the product‑center database or the recommendation vector cluster.

Industry Practices

When a service experiences a failure, a pragmatic approach is to pre‑populate the required data and return it as a fallback. For Xianyu’s product‑stream, the required payload is about 3 MB (≈5 pages). The author surveyed common solutions and selected local caching as a primary technique.

Cache Component Design

The author evaluated several Java caching libraries (Guava, Caffeine, Ehcache, Cache2K, ConcurrentHashMap, Varnish, JackRabbit) and chose Guava for its generality and easy integration with internal middleware. The cluster‑aware cache adds three capabilities:

Asynchronous reload of expired keys.

Cluster‑wide invalidation of a specific key.

Periodic reporting of local cache size per instance.

Implementation details include extending CacheLoader to provide async reload, a LocalCacheManager that aggregates all AbstractCacheConfig subclasses, and configuration beans that automatically gain cluster‑wide invalidation.

Typical Use‑Case: Automatic Fallback Component

The fallback component refreshes data in a Tair store via a scheduled job, invalidates the local cache, and serves subsequent requests from the refreshed local cache, dramatically improving latency and success rate during network partitions.

Outlook

Future work includes a web management console for cache configuration, tiered storage for large or rarely used keys, and disk‑backed caching to avoid memory pressure.

[1] Halodoc Caching Analysis: https://blogs.halodoc.io/in-process-cache-2
distributed systemsbackendjavacachingfault toleranceGuava
Xianyu Technology
Written by

Xianyu Technology

Official account of the Xianyu technology team

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.