Backend Development 13 min read

Transparent Multilevel Cache (TMC): Architecture, Hotspot Detection, and Local Cache Implementation

The article introduces Youzan's Transparent Multilevel Cache (TMC), detailing its three‑layer architecture, hotspot detection and local caching mechanisms, integration approaches for Java applications, stability and consistency features, and performance results from real‑world e‑commerce campaigns.

IT Architects Alliance

Jul 24, 2021

Transparent Multilevel Cache (TMC): Architecture, Hotspot Detection, and Local Cache Implementation

TMC (Transparent Multilevel Cache) is a comprehensive caching solution developed by Youzan PaaS team to address cache hotspot issues in e‑commerce applications.

Why TMC

During promotional activities such as flash sales, unpredictable hotspot keys cause a surge of cache requests that can overwhelm the distributed cache layer and affect system stability. TMC automatically discovers hotspots and pre‑places requests in an application‑level local cache.

Problems of Existing Multilevel Cache

Hotspot detection

: how to quickly and accurately find hot keys. Data consistency: ensure consistency between local and distributed caches. Effect verification: let applications view hit rates and hotspot keys. Transparent integration: minimise intrusion into existing services.

Overall Architecture

The solution consists of three layers: Storage layer (kv stores such as Codis, zanKV, Aerospike), Proxy layer (unified cache entry and routing), and Application layer (client with built‑in hotspot detection and local cache, transparent to business code).

Application‑Layer Local Cache

Transparent Integration

Java services can use either spring.data.redis with RedisTemplate or youzan.framework.redis with RedisClient. TMC modifies the native JedisPool and Jedis classes to embed Hermes‑SDK, which performs hotspot detection and local caching without code changes.

Module Breakdown

Jedis-Client

: standard Jedis interface. Hermes-SDK: SDK that implements hotspot detection and local cache. Hermes server cluster: collects access events, detects hotspots, pushes hot keys to SDK. Cache cluster: proxy + storage layer providing distributed cache. Infrastructure: etcd and Apollo for configuration and cluster coordination.

Key Workflow

When a key is requested, the SDK checks if it is a hotspot; if so, the value is returned from the local cache.

Non‑hot keys are fetched from the cache cluster via the original Jedis call.

Each access event is asynchronously reported to the Hermes server for hotspot analysis.

Key expiration triggers invalidation in both local cache and other SDK instances via etcd.

Hotspot Detection Process

The server collects events, maintains a 10‑slot time wheel (3 s per slot, 30 s window) for each key, aggregates heat, stores results in Redis, and periodically selects the top‑N hot keys to push to SDKs.

Stability and Consistency

Asynchronous reporting using rsyslog + Kafka prevents blocking.

Dedicated thread pool isolates I/O from business threads.

Local cache size limited to 64 MB with LRU eviction.

Hot key updates invalidate local caches immediately and broadcast via etcd for eventual consistency.

Performance Results

Real‑world tests during a Kuaishou live‑sale and Double‑Eleven campaigns showed up to 80 % local‑cache hit rate, significant reduction in request latency, and smoother QPS curves.

Future Outlook

TMC is already serving core modules such as product, logistics, inventory, marketing, user, and gateway services, with configurable thresholds, hot‑key limits, and black‑/white‑lists to adapt to different business scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Java Cache Local Cache hotspot detection multilevel-cache

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.