Transparent Multilevel Cache (TMC): Architecture, Hotspot Detection, and Local Cache Implementation
The article presents the design and implementation of Transparent Multilevel Cache (TMC), a three‑layer caching solution that adds hotspot detection and local cache to reduce distributed cache pressure, explains its transparent Java integration, describes the sliding‑window hotspot discovery pipeline, and showcases performance gains in real‑world e‑commerce campaigns.
TMC (Transparent Multilevel Cache) is a comprehensive caching solution developed by Youzan PaaS to address hotspot access problems in high‑traffic e‑commerce scenarios such as flash sales and promotional events.
It extends a generic distributed cache (e.g., CodisProxy + Redis or Youzan's zanKV) with three key capabilities: application‑level hotspot detection, local cache, and cache‑hit statistics, thereby offloading hotspot traffic from the backend cache cluster.
Why TMC Is Needed
Marketing activities generate unpredictable hotspot keys that cause massive request bursts, saturating network bandwidth and destabilizing services. TMC provides an automatic, transparent mechanism to discover these hotspots and serve them from a local cache placed in the application layer.
Architecture Overview
The solution consists of three layers:
Storage Layer – provides KV storage (Codis, zanKV, Aerospike, etc.).
Proxy Layer – unified cache entry point and routing.
Application Layer – client library with built‑in hotspot detection and local cache, fully transparent to business code.
Transparent Local Cache Integration
Java applications using either spring.data.redis with RedisTemplate or youzan.framework.redis with RedisClient ultimately create a Jedis instance via JedisPool . TMC modifies JedisPool and Jedis to interact first with the Hermes‑SDK , which handles hotspot detection and local caching without code changes.
Key Access Flow
When a key is requested, the client asks Hermes‑SDK whether the key is a hotspot.
If it is a hotspot, the value is returned directly from the local cache.
If not, the SDK forwards the request to the remote cache cluster via a Callable callback.
Every access event is asynchronously reported to the Hermes server cluster for hotspot analysis.
Hotspot Discovery Pipeline
The pipeline consists of four steps:
Data Collection : Hermes‑SDK logs key access events via rsyslog into Kafka; the server cluster consumes these events.
Sliding Window : Each key maintains a 10‑slot time wheel, each slot representing 3 seconds of access count, yielding a 30‑second sliding window.
Aggregation : Every 3 seconds a mapping task aggregates the window counts and stores <key, totalHeat> in a Redis sorted set.
Hotspot Detection : The server periodically selects the top‑N keys exceeding a heat threshold and pushes the hotspot list to all SDK instances via etcd.
Stability and Consistency
Asynchronous reporting using rsyslog + Kafka prevents blocking business threads.
Communication module runs in an isolated thread pool with bounded queues.
Local cache size is limited to 64 MB (LRU) to avoid JVM OOM.
Hotspot invalidation is propagated via etcd to achieve strong local consistency and eventual cluster consistency.
Feature Summary
Real‑time : Hotspot detection latency ≤ 3 seconds.
Accuracy : Sliding‑window aggregation reflects recent access distribution.
Scalability : Server nodes are stateless; horizontal scaling follows Kafka partition count; mapping tasks are multithreaded per app.
Practical Results
During a Kuaishou live‑stream promotion, TMC achieved ~80 % local cache hit rate, significantly reducing remote cache load and improving request latency.
Similar improvements were observed during Double‑11 campaigns across core product, activity, and logistics services.
Future Outlook
TMC is already serving product, logistics, inventory, marketing, user, and gateway modules, with more applications being onboarded. Configuration flexibility (hotspot thresholds, black/white lists) allows fine‑tuning per business need.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.