How a Two‑Level Cache Boosted High‑Concurrency Performance in a Kubernetes System
The article details how designing a two‑level cache architecture—combining local and distributed caches—dramatically reduced CPU usage, response time, and improved system capacity under high QPS workloads in a Kubernetes‑based container environment, while evaluating trade‑offs of several caching strategies.
Introduction
With Kubernetes becoming the new cloud interface, containers are inherently elastic, but high concurrency and large data volumes expose performance bottlenecks. The author introduced a two‑level cache design to improve system performance, resource consumption, and capacity for containerized workloads.
Optimization Results
Stress tests before and after applying the cache optimizations showed a significant drop in CPU usage at the same QPS, a noticeable reduction in average response time, and a larger system capacity ceiling.
Where the Bottleneck Lies
CPU profiling revealed that over 50% of CPU time was spent on deserializing large business data sets retrieved from Redis, specifically fastjson array deserialization of PojoList and Alibaba's cachejson POJO parsing. In high‑concurrency scenarios these costs are multiplied, making distributed cache CPU usage the primary bottleneck.
Data Characteristics and Cache Selection
The workload fits classic cache‑friendly patterns: key‑value access, read‑heavy, write‑light, low change frequency, and relaxed consistency requirements. Based on these attributes, three cache options were compared:
Local cache – fast access, no network overhead, but limited by single‑machine memory and loses data on restart.
Distributed cache (Redis) – high availability and scalable storage, but incurs serialization CPU cost and network traffic.
Two‑level cache (local + distributed) – combines fast local hits with Redis fallback, at the expense of added complexity and extra resource usage.
Cache Scheme Comparison
Scheme 1: Local Cache with Guava refreshAfterWrite
Architecture: Pure local cache with TTL‑based expiration.
Cache miss handling & refresh: Guava’s native refreshAfterWrite triggers an asynchronous loader to pull fresh <key, value> from the upstream service when TTL expires.
Warm‑up: Full pre‑warm at startup.
Pros: No Redis CPU or network cost; uses Guava loader directly.
Cons: High downstream pressure because each cache miss may generate many individual requests; batch‑query capability is unavailable.
Scheme 2: Two‑Level Cache with Refresh Job
Architecture: Local cache backed by Redis; local TTL is effectively infinite.
Cache miss handling: On local miss, query Redis; if Redis also misses, call downstream service.
Refresh: Spring single‑node cron job periodically refreshes the entire cache, ensuring freshness without relying on Guava loader.
Warm‑up: Full pre‑warm at startup.
Pros: High local hit rate eliminates most Redis bottlenecks; leverages existing Guava loader.
Cons: Requires additional single‑node job framework, breaking uniform task management.
Scheme 3: Two‑Level Cache with Guava expireAfterAccess
Architecture: Same two‑level layout, local cache expires after access.
Refresh: No dedicated refresh job; relies on access‑based expiration.
Pros: No extra refresh job; less operational dependency.
Cons: Risk of cache avalanche on restart; Redis serialization CPU bottleneck still present.
Implementation Plan
Based on the trade‑offs, Scheme 2 was selected for production.
Architecture: Local + distributed two‑level cache, with local cache as the primary fast path.
Cache miss handling: Business code routes localCache → Redis → Dubbo to keep the existing call chain.
Warm‑up: Full pre‑warm during application startup.
Refresh: Spring CronJob performs periodic full refresh.
Operations: Per‑host management exposing Dubbo/HTTP services; Redis managed centrally.
Additionally, a generic cache wrapper was built to store all local caches in a map Map<${prefix}, Cache<${key}, ${value}>>, unifying interaction and improving code reuse.
Conclusion and Future Work
The two‑level cache implementation dramatically improved CPU usage, response time, and system capacity under high QPS. Remaining work includes expanding load‑testing patterns, adding bulk‑refresh capabilities (e.g., Caffeine’s Bulk Refresh), implementing cluster‑wide invalidation, and consolidating job management frameworks across distributed and single‑node tasks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
