How Transparent Multilevel Cache (TMC) Eliminates Hotspot Bottlenecks in High‑Traffic E‑Commerce
This article explains Youzan's Transparent Multilevel Cache (TMC), detailing its architecture, transparent Java integration, hotspot detection and local caching mechanisms, and demonstrates its real‑world performance gains during flash‑sale events and large‑scale marketing campaigns.
TMC (Transparent Multilevel Cache) is a comprehensive cache solution developed by Youzan PaaS team for internal applications.
It builds on generic distributed cache solutions (such as CodisProxy + Redis or Youzan's zanKV) and adds application‑level hotspot detection, local caching, and hit statistics to relieve hotspot pressure on downstream cache services.
Why TMC
During marketing activities like flash sales, e‑commerce merchants generate unpredictable cache hotspot traffic, causing a few hot keys to flood the distributed cache, consume bandwidth and threaten system stability. TMC automatically discovers hotspots and pre‑places requests in an application‑level local cache.
Pain Points of Multilevel Cache Solutions
Hotspot detection: quickly and accurately identify hot keys.
Data consistency: ensure strong consistency between local cache and distributed cache.
Effect verification: let applications view local cache hit rates and hot key data.
Transparent integration: minimize intrusion into application code.
TMC Overall Architecture
The architecture consists of three layers:
Storage layer: provides basic KV storage, using services such as Codis, zanKV, or Aerospike.
Proxy layer: offers a unified cache entry and routing for the application layer.
Application layer: supplies a unified client with built‑in hotspot detection and local cache, transparent to business logic.
Application‑Layer Local Cache
Transparent Integration
Java services can use either spring.data.redis with RedisTemplate or youzan.framework.redis with RedisClient. Both ultimately create a Jedis object via JedisPool. TMC modifies JedisPool and Jedis to interact with Hermes‑SDK, which adds hotspot detection and local caching without any code changes.
Overall Structure
Modules:
Jedis‑Client : direct entry for Java applications to communicate with the cache server.
Hermes‑SDK : SDK encapsulating hotspot detection and local cache.
Hermes server cluster : receives access events, performs hotspot detection, and pushes hot keys to SDK.
Cache cluster : composed of proxy and storage layers, providing a unified distributed cache service.
Base components : etcd cluster and Apollo configuration center for cluster push and unified configuration.
Basic Flow
Key retrieval : When an application calls the Jedis client, it first asks Hermes‑SDK whether the key is a hotspot. If it is, the value is returned from the local cache; otherwise the request is forwarded to the cache cluster.
Key expiration : Calls to set(), del(), or expire() trigger Hermes‑SDK.invalid() to invalidate the local cache and broadcast the event via etcd to other SDK nodes.
Hotspot discovery : Hermes servers collect key access events, maintain a 30‑second sliding window (10 slots of 3 seconds), aggregate heat, and push hot‑key lists to SDKs.
Configuration reading : Both SDK and server read runtime configuration (thresholds, black/white lists, etc.) from Apollo.
Stability
Asynchronous reporting: Hermes‑SDK uses rsyslog to report events without blocking business threads.
Thread isolation: Communication module runs in a dedicated thread pool with bounded queues.
Cache size control: Local cache is limited to 64 MB (LRU) to avoid JVM heap overflow.
Consistency
Only hotspot keys are cached locally; the majority of keys remain in the distributed cache.
When a hotspot key changes, Hermes‑SDK invalidates the local entry immediately (strong consistency).
Invalidation events are broadcast via etcd to other SDK nodes, achieving eventual consistency across the cluster.
Hotspot Discovery
Overall Process
The process consists of four steps:
Data collection : Hermes‑SDK logs key access events via rsyslog, sends them to Kafka, and Hermes servers consume the messages.
Heat sliding window : Each key maintains a 10‑slot time wheel, each slot counting accesses in a 3‑second interval, representing a 30‑second window.
Heat aggregation : Every 3 seconds, a mapping task sums the slots for each key, stores <key, window‑heat> in a Redis sorted set.
Hotspot detection : Periodically, the server selects the top‑N keys whose heat exceeds a configured threshold and pushes the list to SDKs via etcd.
Features
Real‑time
Events are reported in real time; the mapping task runs every 3 seconds, so hotspots can be detected within 3 seconds of appearance.
Accuracy
The sliding‑window aggregation accurately reflects recent access distribution.
Scalability
Hermes servers are stateless and can be horizontally scaled with Kafka partitions; mapping tasks are multithreaded per application.
Practical Results
Fast‑hand merchant campaign
The blue line shows total cache get calls; the green line shows local‑cache hits. Local‑cache hit rate reached nearly 80 % during the event.
During the campaign, request volume increased sharply, but response time decreased thanks to local caching.
Double‑11 application results
Future Outlook
TMC already serves product, logistics, inventory, marketing, user, gateway & messaging modules, and more applications are being onboarded. Configuration options allow fine‑tuning of hotspot thresholds, hot‑key count, and black/white lists for optimal performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
