How a Two‑Level Cache Boosted High‑Concurrency Performance in a Kubernetes System

The article details how designing a two‑level cache architecture—combining local and distributed caches—dramatically reduced CPU usage, response time, and improved system capacity under high QPS workloads in a Kubernetes‑based container environment, while evaluating trade‑offs of several caching strategies.

dbaplus Community
dbaplus Community
dbaplus Community
How a Two‑Level Cache Boosted High‑Concurrency Performance in a Kubernetes System

Introduction

With Kubernetes becoming the new cloud interface, containers are inherently elastic, but high concurrency and large data volumes expose performance bottlenecks. The author introduced a two‑level cache design to improve system performance, resource consumption, and capacity for containerized workloads.

Optimization Results

Stress tests before and after applying the cache optimizations showed a significant drop in CPU usage at the same QPS, a noticeable reduction in average response time, and a larger system capacity ceiling.

Where the Bottleneck Lies

CPU profiling revealed that over 50% of CPU time was spent on deserializing large business data sets retrieved from Redis, specifically fastjson array deserialization of PojoList and Alibaba's cachejson POJO parsing. In high‑concurrency scenarios these costs are multiplied, making distributed cache CPU usage the primary bottleneck.

Data Characteristics and Cache Selection

The workload fits classic cache‑friendly patterns: key‑value access, read‑heavy, write‑light, low change frequency, and relaxed consistency requirements. Based on these attributes, three cache options were compared:

Local cache – fast access, no network overhead, but limited by single‑machine memory and loses data on restart.

Distributed cache (Redis) – high availability and scalable storage, but incurs serialization CPU cost and network traffic.

Two‑level cache (local + distributed) – combines fast local hits with Redis fallback, at the expense of added complexity and extra resource usage.

Cache Scheme Comparison

Scheme 1: Local Cache with Guava refreshAfterWrite

Architecture: Pure local cache with TTL‑based expiration.

Cache miss handling & refresh: Guava’s native refreshAfterWrite triggers an asynchronous loader to pull fresh <key, value> from the upstream service when TTL expires.

Warm‑up: Full pre‑warm at startup.

Pros: No Redis CPU or network cost; uses Guava loader directly.

Cons: High downstream pressure because each cache miss may generate many individual requests; batch‑query capability is unavailable.

Scheme 2: Two‑Level Cache with Refresh Job

Architecture: Local cache backed by Redis; local TTL is effectively infinite.

Cache miss handling: On local miss, query Redis; if Redis also misses, call downstream service.

Refresh: Spring single‑node cron job periodically refreshes the entire cache, ensuring freshness without relying on Guava loader.

Warm‑up: Full pre‑warm at startup.

Pros: High local hit rate eliminates most Redis bottlenecks; leverages existing Guava loader.

Cons: Requires additional single‑node job framework, breaking uniform task management.

Scheme 3: Two‑Level Cache with Guava expireAfterAccess

Architecture: Same two‑level layout, local cache expires after access.

Refresh: No dedicated refresh job; relies on access‑based expiration.

Pros: No extra refresh job; less operational dependency.

Cons: Risk of cache avalanche on restart; Redis serialization CPU bottleneck still present.

Implementation Plan

Based on the trade‑offs, Scheme 2 was selected for production.

Architecture: Local + distributed two‑level cache, with local cache as the primary fast path.

Cache miss handling: Business code routes localCache → Redis → Dubbo to keep the existing call chain.

Warm‑up: Full pre‑warm during application startup.

Refresh: Spring CronJob performs periodic full refresh.

Operations: Per‑host management exposing Dubbo/HTTP services; Redis managed centrally.

Additionally, a generic cache wrapper was built to store all local caches in a map Map<${prefix}, Cache<${key}, ${value}>>, unifying interaction and improving code reuse.

Conclusion and Future Work

The two‑level cache implementation dramatically improved CPU usage, response time, and system capacity under high QPS. Remaining work includes expanding load‑testing patterns, adding bulk‑refresh capabilities (e.g., Caffeine’s Bulk Refresh), implementing cluster‑wide invalidation, and consolidating job management frameworks across distributed and single‑node tasks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance OptimizationBackend Developmentcachinghigh concurrencytwo-level cache
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.