Transparent Multilevel Cache (TMC): Architecture, Hotspot Detection, and Local Cache Implementation
The article introduces Youzan's Transparent Multilevel Cache (TMC), detailing its three‑layer architecture, hotspot detection and local caching mechanisms, integration approaches for Java applications, stability and consistency features, and performance results from real‑world e‑commerce campaigns.
TMC (Transparent Multilevel Cache) is a comprehensive caching solution developed by Youzan PaaS team to address cache hotspot issues in e‑commerce applications.
Why TMC
During promotional activities such as flash sales, unpredictable hotspot keys cause a surge of cache requests that can overwhelm the distributed cache layer and affect system stability. TMC automatically discovers hotspots and pre‑places requests in an application‑level local cache.
Problems of Existing Multilevel Cache
Hotspot detection : how to quickly and accurately find hot keys.
Data consistency : ensure consistency between local and distributed caches.
Effect verification : let applications view hit rates and hotspot keys.
Transparent integration : minimise intrusion into existing services.
Overall Architecture
The solution consists of three layers: Storage layer (kv stores such as Codis, zanKV, Aerospike), Proxy layer (unified cache entry and routing), and Application layer (client with built‑in hotspot detection and local cache, transparent to business code).
Application‑Layer Local Cache
Transparent Integration
Java services can use either spring.data.redis with RedisTemplate or youzan.framework.redis with RedisClient. TMC modifies the native JedisPool and Jedis classes to embed Hermes‑SDK, which performs hotspot detection and local caching without code changes.
Module Breakdown
Jedis-Client : standard Jedis interface.
Hermes-SDK : SDK that implements hotspot detection and local cache.
Hermes server cluster : collects access events, detects hotspots, pushes hot keys to SDK.
Cache cluster : proxy + storage layer providing distributed cache.
Infrastructure : etcd and Apollo for configuration and cluster coordination.
Key Workflow
When a key is requested, the SDK checks if it is a hotspot; if so, the value is returned from the local cache.
Non‑hot keys are fetched from the cache cluster via the original Jedis call.
Each access event is asynchronously reported to the Hermes server for hotspot analysis.
Key expiration triggers invalidation in both local cache and other SDK instances via etcd.
Hotspot Detection Process
The server collects events, maintains a 10‑slot time wheel (3 s per slot, 30 s window) for each key, aggregates heat, stores results in Redis, and periodically selects the top‑N hot keys to push to SDKs.
Stability and Consistency
Asynchronous reporting using rsyslog + Kafka prevents blocking.
Dedicated thread pool isolates I/O from business threads.
Local cache size limited to 64 MB with LRU eviction.
Hot key updates invalidate local caches immediately and broadcast via etcd for eventual consistency.
Performance Results
Real‑world tests during a Kuaishou live‑sale and Double‑Eleven campaigns showed up to 80 % local‑cache hit rate, significant reduction in request latency, and smoother QPS curves.
Future Outlook
TMC is already serving core modules such as product, logistics, inventory, marketing, user, and gateway services, with configurable thresholds, hot‑key limits, and black‑/white‑lists to adapt to different business scenarios.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.