How Transparent Multilevel Cache (TMC) Boosts Performance with Hotspot Detection and Local Caching

The Transparent Multilevel Cache (TMC) solution adds application‑level hotspot detection, local caching, and hit‑rate statistics to a standard distributed cache stack, enabling automatic hotspot discovery, reducing load on backend cache clusters, and improving system stability and latency during traffic spikes.

Java Interview Crash Guide
Java Interview Crash Guide
Java Interview Crash Guide
How Transparent Multilevel Cache (TMC) Boosts Performance with Hotspot Detection and Local Caching

What Is TMC?

TMC (Transparent Multilevel Cache) is a comprehensive caching solution provided by Youzan PaaS for internal applications.

Built on a generic distributed cache (e.g., CodisProxy + Redis or Youzan's own zanKV), TMC adds:

Application‑level hotspot detection

Application‑level local cache

Application‑level cache‑hit statistics

These features help solve hotspot access problems at the application layer.

Why Build TMC?

E‑commerce merchants on Youzan run unpredictable promotional activities (flash sales, product pushes, order processing) that create sudden cache‑hotspot traffic, overwhelming distributed cache systems and affecting application stability.

TMC automatically discovers hotspots and pre‑places hotspot requests in a local cache, reducing pressure on downstream caches.

Pain Points of Multilevel Cache Solutions

Hotspot detection

: How to quickly and accurately find hotspot keys? Data consistency: How to ensure consistency between the local cache and the distributed cache? Effect verification: How can applications view local‑cache hit rates and hotspot keys to verify effectiveness? Transparent integration: How to minimize intrusion and achieve smooth, fast adoption?

TMC focuses on these issues, providing hotspot detection and local caching to reduce impact on downstream cache services.

TMC Overall Architecture

The architecture consists of three layers: Storage layer: Provides basic KV storage, using different services (Codis, zanKV, Aerospike) per business scenario. Proxy layer: Offers a unified cache entry and protocol for applications, handling routing after horizontal sharding. Application layer: Supplies a unified client with built‑in hotspot detection and local caching, transparent to business logic.

This article focuses on the application‑layer client’s hotspot detection and local caching.

Application‑Layer Local Cache

Transparent Integration

Java services can use either the spring.data.redis package with RedisTemplate or the youzan.framework.redis package with RedisClient. Both ultimately create a Jedis object via JedisPool that talks to the proxy layer.

TMC modifies the native JedisPool and Jedis classes to embed hotspot discovery and local caching logic from the Hermes‑SDK during initialization. Using a specific version of the jedis‑jar package, applications gain hotspot detection and local caching without code changes.

Overall Structure

Module Breakdown

Jedis‑Client

: Direct entry for Java applications to communicate with the cache service. Hermes‑SDK: Encapsulates hotspot discovery and local caching. Hermes Server Cluster: Receives access events, detects hotspots, and pushes hotspot keys to the SDK. Cache Cluster: Consists of proxy and storage layers, providing a unified distributed cache endpoint. Base Components: Etcd cluster and Apollo configuration center for cluster push and unified configuration.

Basic Workflow

Key Retrieval

When a Java app requests a key via Jedis‑Client, the SDK checks if the key is a hotspot.

Hotspot keys are served from the local cache, bypassing the cache cluster.

Non‑hotspot keys are fetched from the cache cluster via a callable callback.

Each request is asynchronously reported to the Hermes server for hotspot analysis.

Key Expiration

Calls to set(), del(), expire() trigger an invalid() call in the SDK.

For hotspot keys, the local cache entry is invalidated immediately, ensuring strong consistency.

The event is broadcast via Etcd to other SDK nodes, which also invalidate their local copies, achieving eventual consistency.

Hotspot Discovery

The Hermes server continuously collects access events and, every 3 seconds, computes a sliding‑window heat for each key.

Hot keys exceeding a threshold are selected as the Top N and pushed to SDK nodes.

Configuration Reading

Both SDK and server nodes read runtime configuration (e.g., enable/disable flags, black‑white lists, Etcd addresses) from Apollo.

Stability

Asynchronous data reporting

: Hermes‑SDK uses rsyslog to report events without blocking business threads. Thread‑isolated communication module: Separate thread pool with bounded queue isolates I/O from business execution. Cache size control: Local cache size is limited to 64 MB (LRU) to prevent JVM heap overflow.

Consistency

Only hotspot keys are cached locally; the majority of data resides in the cache cluster.

When a hotspot key changes, the SDK invalidates the local entry, guaranteeing strong consistency.

Invalidations are broadcast via Etcd, ensuring eventual consistency across all application instances.

Hotspot Discovery Process

Overall Flow

The process consists of four steps: Data collection: SDK reports key access events to Kafka. Heat sliding window: A time wheel records access counts for each key over a 30‑second window. Heat aggregation: Aggregated heat values are stored in Redis as sorted sets. Hotspot detection: The server selects the Top N keys exceeding the heat threshold and pushes them to SDK nodes.

Data Collection

SDK sends events (appName, uniqueKey, sendTime, weight) to Kafka; server nodes consume them in real time.

Heat Sliding Window

Each key maintains a wheel of 10 slots, each representing a 3‑second interval; the sum gives the total accesses in the last 30 seconds.

Heat Aggregation

After sliding‑window calculation, the server aggregates heat per key and stores <key, totalHeat> in Redis.

Hotspot Detection

Periodically, the server extracts keys whose heat exceeds the configured threshold, selects the Top N, and pushes the list to SDK nodes.

Feature Summary

Real‑time

Events are reported via rsyslog + Kafka; the sliding‑window and aggregation run every 3 seconds, detecting hotspots within at most 3 seconds.

Accuracy

The time‑wheel sliding window provides a precise view of recent access distribution.

Scalability

Server nodes are stateless and can scale horizontally based on Kafka partitions; the sliding‑window and aggregation are multithreaded per app.

Real‑World Impact

Fast‑Shop Merchant Promotion

During a short‑term promotion, cache request volume and local‑cache hit rate both rose sharply, with local‑cache hit rate reaching ~80%.

Cache request and hit curves show the increase.

Local‑cache hit‑rate curve.

QPS and Latency Improvements

During the event, request QPS grew while response time (RT) decreased thanks to local caching.

Future Outlook

TMC already serves product, logistics, inventory, marketing, user, gateway, and messaging modules, with more applications being onboarded. Users can tune hotspot thresholds, detection counts, and black‑/white‑list settings to optimize performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsJavaCachehotspot detectionlocal caching
Java Interview Crash Guide
Written by

Java Interview Crash Guide

Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.