How Transparent Multilevel Cache (TMC) Supercharges Java Application Performance
The article explains Youzan's Transparent Multilevel Cache (TMC), a solution that automatically detects cache hotspots, adds an application‑level local cache, and provides hit statistics to reduce load on distributed caches, improve consistency, and boost performance for high‑traffic e‑commerce scenarios.
What Is TMC?
TMC (Transparent Multilevel Cache) is a cache solution created by Youzan's PaaS team for internal applications. It builds on generic distributed cache systems such as CodisProxy + Redis or the self‑developed zanKV, and adds three key features: application‑layer hotspot detection, local cache, and cache‑hit statistics.
Why Build TMC?
E‑commerce merchants frequently launch flash‑sale or promotion activities that cause unpredictable cache‑hotspot traffic in modules like "product flash sale", "product detail", and "order transaction". During hotspot periods, a small number of hot keys generate massive cache requests, overwhelming the distributed cache, consuming bandwidth, and threatening application stability. TMC was created to automatically discover hotspots and pre‑place those requests in a local cache, thus protecting downstream services.
Multilevel Cache Pain Points Addressed
Hotspot detection – quickly and accurately identify hot keys.
Data consistency – ensure local cache stays consistent with the distributed cache.
Effect verification – allow applications to view local‑cache hit rates and hotspot keys.
Transparent integration – minimise intrusion into existing application code.
TMC Overall Architecture
The architecture consists of three layers:
Storage layer – provides basic KV storage (Codis, zanKV, Aerospike, etc.).
Proxy layer – offers a unified cache entry point and routing for horizontally sharded data.
Application layer – supplies a unified client with built‑in hotspot detection and local cache, completely transparent to business logic.
Transparent Integration in Java
Java services can use either the spring.data.redis package with RedisTemplate or the youzan.framework.redis package with RedisClient . In both cases the final request is made through a JedisPool that creates a Jedis object. TMC modifies the native JedisPool and Jedis classes to interact with the Hermes‑SDK , which implements hotspot detection and local caching, achieving a fully transparent integration without code changes.
Component Overview
Jedis‑Client – the direct entry for Java applications to communicate with the cache server.
Hermes‑SDK – the self‑developed SDK that encapsulates hotspot detection and local cache.
Hermes Server Cluster – receives access events from the SDK, performs hotspot analysis, and pushes hotspot keys to SDK instances.
Cache Cluster – composed of proxy and storage layers, providing a unified distributed cache service.
Basic Components – etcd cluster and Apollo configuration centre for cluster push and unified configuration.
Basic Workflow
Key Retrieval
When a Java application calls the Jedis‑Client to get a key, the client asks Hermes‑SDK whether the key is a hotspot.
If it is a hotspot, the value is returned directly from the SDK’s local cache, bypassing the cache cluster.
If it is not a hotspot, the SDK invokes the original Jedis interface (via a Callable) to fetch the value from the cache cluster.
Each access event is reported asynchronously to the Hermes Server Cluster for hotspot analysis.
Key Expiration
Operations such as set(), del(), or expire() trigger the SDK’s invalid() method to mark the key as expired locally.
For hotspot keys, the SDK first invalidates the local value to achieve strong consistency, then broadcasts the expiration event through etcd to other SDK nodes, ensuring eventual consistency across the cluster.
Hotspot Discovery Process
Data Collection : Hermes‑SDK uses rsyslog to send key‑access events to kafka; each Hermes server consumes these events in real time.
Sliding Window : For each key, a time wheel with 10 slots (each representing a 3‑second slice) records the number of accesses, covering a 30‑second window.
Aggregation : The total count across the 10 slots yields the windowed hotness, which is stored in a Redis sorted set.
Hotspot Detection : Every 3 seconds the server selects the top‑N keys whose hotness exceeds a configured threshold and pushes the list to SDK instances via etcd.
Key Features
Real‑time : With a 3‑second cycle, hotspots are detected within at most 3 seconds.
Accuracy : The sliding‑window aggregation reflects recent access distribution accurately.
Scalability : Hermes servers are stateless; node count can scale with Kafka partitions, and the sliding‑window computation is multi‑threaded per application.
Practical Impact
During a Kuaishou merchant flash‑sale, TMC recorded a local‑cache hit rate close to 80%, significantly reducing latency while request volume surged. Similar improvements were observed during the Double‑11 shopping festival across multiple core services, with QPS increasing and response time decreasing. TMC now serves product, logistics, inventory, marketing, user, gateway, and messaging modules, and continues to expand. Configuration options such as hotspot thresholds, hotspot count, and black/white lists allow fine‑tuned performance for each business scenario.
Illustrations
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
