How Transparent Multilevel Cache (TMC) Eliminates Hotspot Bottlenecks in Java Services

The article introduces Youzan's Transparent Multilevel Cache (TMC), explains why hotspot cache access harms e‑commerce applications, describes its three‑layer architecture, details the Java client integration with Hermes‑SDK for automatic hotspot detection and local caching, and presents real‑world performance gains during large‑scale promotional events.

21CTO
21CTO
21CTO
How Transparent Multilevel Cache (TMC) Eliminates Hotspot Bottlenecks in Java Services

Why Build TMC

Many Youzan merchants run flash‑sale or promotion activities that cause unpredictable cache‑hotspot access in marketing, product‑detail and order‑placement flows. Hotspot keys generate massive request bursts, consume internal bandwidth and jeopardize service stability. TMC was created to automatically discover hotspots and pre‑place those requests in an application‑layer local cache.

Pain Points of Multilevel Cache Solutions

Hotspot detection – how to quickly and accurately find hotspot keys.

Data consistency – how to keep the local cache consistent with the distributed cache.

Effect verification – how to let the application view local‑cache hit rates and hotspot keys.

Transparent integration – how to add the solution with minimal intrusion.

TMC focuses on these pain points, providing hotspot detection and local caching to reduce pressure on downstream distributed cache services.

TMC Overall Architecture

The architecture consists of three layers:

Storage layer – basic KV stores (Codis, zanKV, Aerospike) selected per business scenario.

Proxy layer – a unified cache entry and routing for the application layer.

Application layer – a unified client offering hotspot detection and local caching transparently to business services.

This article concentrates on the application‑layer client.

TMC Local Cache – Transparency

Java services can use either the spring.data.redis package with RedisTemplate or the youzan.framework.redis package with RedisClient. Both ultimately create a JedisPool which produces Jedis objects that communicate with the proxy layer.

TMC modifies the native JedisPool and Jedis classes via the Hermes‑SDK to embed hotspot detection and local caching. Applications only need to depend on a specific version of the jedis‑jar; no code changes are required.

Overall Structure

Jedis‑Client : Direct entry for Java applications to interact with the cache service; API identical to native Jedis.

Hermes‑SDK : Encapsulates the self‑developed hotspot detection and local cache functions; Jedis‑Client interacts with it.

Hermes Server Cluster : Receives reports from Hermes‑SDK, performs hotspot detection, and pushes hotspot keys back to the SDK.

Cache Cluster : Consists of proxy and storage layers, providing a unified distributed cache endpoint.

Basic Components : etcd cluster and Apollo configuration center supply cluster push and unified configuration capabilities.

Basic Process

Key Retrieval : When a Java application calls the Jedis‑Client to get a key, the client asks Hermes‑SDK whether the key is a hotspot. If it is, the value is returned from the local cache; otherwise the request is forwarded to the cache cluster via a callback.

Key Expiration : Calls to set(), del() or expire() trigger Hermes‑SDK.invalid(), which invalidates the local cache entry and broadcasts the event through etcd to other SDK nodes, achieving strong consistency for hotspot keys.

Hotspot Detection : Hermes‑SDK reports key‑access events (via rsyslog → Kafka) to the Hermes server cluster. The server stores events in an in‑memory Map<appName, Map<key, Event>>, maintains a 10‑slot time wheel (each slot covers 3 seconds) for a 30‑second sliding window, runs a mapping task every 3 seconds to aggregate heat, stores the aggregated scores in a Redis sorted set, and selects the top‑N keys whose heat exceeds a threshold. Those hotspot keys are pushed to SDK instances through etcd.

Features Summary

Real‑time : Hermes‑SDK reports events in real time via rsyslog + Kafka; mapping tasks run every 3 seconds, so a hotspot can be detected within 3 seconds.

Accuracy : Sliding‑window aggregation based on the time wheel reflects recent access distribution accurately.

Scalability : Hermes server nodes are stateless; horizontal scaling follows Kafka partition count. Mapping tasks are multithreaded per app.

Stability : Event reporting is asynchronous; communication module runs in an isolated thread pool; local cache size is limited to 64 MB LRU to avoid JVM OOM.

Consistency : Hotspot keys are cached locally with strong consistency; non‑hotspot data resides in the distributed cache. Invalidation events are broadcast via etcd to achieve eventual consistency across the cluster.

Performance Results

During a Kuaishou merchant flash‑sale, TMC recorded a local‑cache hit rate of nearly 80 % and a clear increase in request volume while latency decreased. Similar improvements were observed during Double‑11, with QPS rising and response time dropping across core product, logistics, inventory, marketing, and user services.

TMC now serves product, logistics, inventory, marketing, user, gateway and messaging modules, and continues to expand. Configuration options such as hotspot thresholds, detection counts, and black/white lists allow fine‑tuning per business needs.

Source: Youzan Technology
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsperformanceCachelocal cachehotspot detection
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.