How a Two‑Level Cache Boosted High‑Concurrency Container Performance

By redesigning the caching layer with a two‑level architecture combining local and distributed caches, the author dramatically reduced CPU usage, lowered response times, and increased system capacity under high QPS workloads, while evaluating trade‑offs of various cache strategies, pre‑warming, refresh mechanisms, and operational considerations.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How a Two‑Level Cache Boosted High‑Concurrency Container Performance

With Kubernetes becoming the new cloud interface, containers inherently support high concurrency. To improve elasticity, the author designed a two‑level cache architecture, achieving significant performance gains.

Optimization Results

After multiple stress tests, the two‑level cache reduced CPU usage dramatically at the same QPS, lowered average response time, and increased system capacity. The resource consumption drop and RT improvements are illustrated below.

System Bottleneck

CPU flame‑graph profiling under high QPS showed that over 50% of CPU time was spent on deserializing large Redis data (fastjson array) and POJO parsing, which become severe bottlenecks when the data volume is large.

Data Attributes and Cache Selection

The workload fits a key‑value pattern, is read‑heavy, has low write frequency, low change rate, and low consistency requirements – an ideal scenario for caching. The options considered were local cache, distributed cache, and a two‑level cache.

Local Cache

Fast access speed

Reduces network overhead (no Redis traffic)

Lower availability – data loss on instance restart

Memory limited, unsuitable for very large data sets

Distributed Cache

High availability – data shared across instances

Scalable storage capacity

CPU bottleneck from serialization of large payloads

Network overhead due to Redis traffic

Local + Distributed Two‑Level Cache

Combines advantages of both layers, providing hierarchical acceleration

Complex implementation and additional resource consumption

Cache Scheme Comparison

Three solutions were evaluated based on architecture, pre‑warming, cache‑miss handling, refresh strategy, and operational impact.

Solution 1: Local Cache with Guava refreshAfterWrite

Uses Guava’s native refreshAfterWrite and asynchronous reLoader. Pros: no Redis CPU or network cost, leverages Guava loader. Cons: high downstream pressure due to per‑key refresh requests; discarded because it doubled downstream load.

Solution 2: Two‑Level Cache + Refresh Job

Combines local cache (no expiration) with a distributed Redis cache. Cache miss first checks local, then Redis, then downstream service. A Spring single‑node job periodically refreshes the cache. Pros: high local hit rate, avoids Redis bottleneck. Cons: introduces an extra job framework, breaking uniform task management.

Solution 3: Two‑Level Cache + Guava expireAfterAccess

Local cache expires after access; no refresh job needed. Pros: no extra job, less operational dependency. Cons: risk of cache avalanche on startup pre‑warming, and Redis serialization CPU bottleneck remains.

Implementation Plan

Based on the analysis, Solution 2 was selected for implementation.

Architecture: Local + distributed two‑level cache, with local cache as the primary fast layer.

Cache‑miss handling: Business code follows the existing flow: localCache → Redis → Dubbo service.

Pre‑warming: Full pre‑warm before the service starts handling requests.

Refresh: SpringCronJob performs periodic full refresh.

Operations: Per‑host management via exposed Dubbo/HTTP services; Redis managed centrally.

Cache wrapper: Unified wrapper stores all local caches in a Map<${prefix}, Cache<${key}, ${value}>> for consistent interaction and easier reuse.

Conclusion

The two‑level cache implementation dramatically improved system performance under high concurrency and large data volumes, reducing CPU usage, response time, and increasing capacity. Future work includes broader load‑testing patterns, bulk refresh capabilities, cluster‑wide invalidation tools, and unifying job management across distributed and single‑node tasks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsPerformance Optimizationcachingtwo-level cache
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.