How Transparent Multilevel Cache (TMC) Eliminates Hotspot Bottlenecks in High‑Traffic E‑Commerce

This article explains Youzan's Transparent Multilevel Cache (TMC), detailing its architecture, transparent Java integration, hotspot detection and local caching mechanisms, and demonstrates its real‑world performance gains during flash‑sale events and large‑scale marketing campaigns.

Java Backend Technology
Java Backend Technology
Java Backend Technology
How Transparent Multilevel Cache (TMC) Eliminates Hotspot Bottlenecks in High‑Traffic E‑Commerce

TMC (Transparent Multilevel Cache) is a comprehensive cache solution developed by Youzan PaaS team for internal applications.

It builds on generic distributed cache solutions (such as CodisProxy + Redis or Youzan's zanKV) and adds application‑level hotspot detection, local caching, and hit statistics to relieve hotspot pressure on downstream cache services.

Why TMC

During marketing activities like flash sales, e‑commerce merchants generate unpredictable cache hotspot traffic, causing a few hot keys to flood the distributed cache, consume bandwidth and threaten system stability. TMC automatically discovers hotspots and pre‑places requests in an application‑level local cache.

Pain Points of Multilevel Cache Solutions

Hotspot detection: quickly and accurately identify hot keys.

Data consistency: ensure strong consistency between local cache and distributed cache.

Effect verification: let applications view local cache hit rates and hot key data.

Transparent integration: minimize intrusion into application code.

TMC Overall Architecture

TMC architecture diagram
TMC architecture diagram

The architecture consists of three layers:

Storage layer: provides basic KV storage, using services such as Codis, zanKV, or Aerospike.

Proxy layer: offers a unified cache entry and routing for the application layer.

Application layer: supplies a unified client with built‑in hotspot detection and local cache, transparent to business logic.

Application‑Layer Local Cache

Transparent Integration

Java services can use either spring.data.redis with RedisTemplate or youzan.framework.redis with RedisClient. Both ultimately create a Jedis object via JedisPool. TMC modifies JedisPool and Jedis to interact with Hermes‑SDK, which adds hotspot detection and local caching without any code changes.

Overall Structure

Module diagram
Module diagram

Modules:

Jedis‑Client : direct entry for Java applications to communicate with the cache server.

Hermes‑SDK : SDK encapsulating hotspot detection and local cache.

Hermes server cluster : receives access events, performs hotspot detection, and pushes hot keys to SDK.

Cache cluster : composed of proxy and storage layers, providing a unified distributed cache service.

Base components : etcd cluster and Apollo configuration center for cluster push and unified configuration.

Basic Flow

Key retrieval : When an application calls the Jedis client, it first asks Hermes‑SDK whether the key is a hotspot. If it is, the value is returned from the local cache; otherwise the request is forwarded to the cache cluster.

Key expiration : Calls to set(), del(), or expire() trigger Hermes‑SDK.invalid() to invalidate the local cache and broadcast the event via etcd to other SDK nodes.

Hotspot discovery : Hermes servers collect key access events, maintain a 30‑second sliding window (10 slots of 3 seconds), aggregate heat, and push hot‑key lists to SDKs.

Configuration reading : Both SDK and server read runtime configuration (thresholds, black/white lists, etc.) from Apollo.

Stability

Asynchronous reporting: Hermes‑SDK uses rsyslog to report events without blocking business threads.

Thread isolation: Communication module runs in a dedicated thread pool with bounded queues.

Cache size control: Local cache is limited to 64 MB (LRU) to avoid JVM heap overflow.

Consistency

Only hotspot keys are cached locally; the majority of keys remain in the distributed cache.

When a hotspot key changes, Hermes‑SDK invalidates the local entry immediately (strong consistency).

Invalidation events are broadcast via etcd to other SDK nodes, achieving eventual consistency across the cluster.

Hotspot Discovery

Overall Process

Hotspot discovery flow
Hotspot discovery flow

The process consists of four steps:

Data collection : Hermes‑SDK logs key access events via rsyslog, sends them to Kafka, and Hermes servers consume the messages.

Heat sliding window : Each key maintains a 10‑slot time wheel, each slot counting accesses in a 3‑second interval, representing a 30‑second window.

Heat aggregation : Every 3 seconds, a mapping task sums the slots for each key, stores <key, window‑heat> in a Redis sorted set.

Hotspot detection : Periodically, the server selects the top‑N keys whose heat exceeds a configured threshold and pushes the list to SDKs via etcd.

Features

Real‑time

Events are reported in real time; the mapping task runs every 3 seconds, so hotspots can be detected within 3 seconds of appearance.

Accuracy

The sliding‑window aggregation accurately reflects recent access distribution.

Scalability

Hermes servers are stateless and can be horizontally scaled with Kafka partitions; mapping tasks are multithreaded per application.

Practical Results

Fast‑hand merchant campaign

Cache request and hit curves
Cache request and hit curves

The blue line shows total cache get calls; the green line shows local‑cache hits. Local‑cache hit rate reached nearly 80 % during the event.

QPS curve
QPS curve
RT curve
RT curve

During the campaign, request volume increased sharply, but response time decreased thanks to local caching.

Double‑11 application results

Product domain core app
Product domain core app
Activity domain core app 1
Activity domain core app 1
Activity domain core app 2
Activity domain core app 2

Future Outlook

TMC already serves product, logistics, inventory, marketing, user, gateway & messaging modules, and more applications are being onboarded. Configuration options allow fine‑tuning of hotspot thresholds, hot‑key count, and black/white lists for optimal performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed Systemscachinghotspot detectionmultilevel cache
Java Backend Technology
Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.