Backend Development 8 min read

Boost API Latency 10× with Spring Boot 3 and a Local Cache Pyramid

The article demonstrates how to achieve a ten‑fold reduction in API response time by building a three‑level cache pyramid (Caffeine L1, Redis L2, DB L3) in Spring Boot 3, covering dependencies, configuration, core template code, warm‑up, monitoring, load‑test results and common high‑concurrency pitfalls.

java1234

Feb 3, 2026

Boost API Latency 10× with Spring Boot 3 and a Local Cache Pyramid

Even when a remote Redis cache is added, latency can remain high because each request still incurs network round‑trip, CPU context switches and serialization overhead. The author outlines a typical optimization path: cut database I/O with caching, cut network I/O with a local cache, and eliminate serialization with zero‑copy. A remote Redis call of 1‑2 ms can balloon to 5‑10 ms under high concurrency, while a local cache hit costs only tens of nanoseconds.

The solution is a "three‑level pyramid" built with Spring Boot 3: L1 Caffeine (in‑process), L2 Redis (remote), and L3 the underlying database. This model aligns with data‑hotness distribution; the author notes that at 10 k QPS, each 1 % increase in L1 hit rate reduces CPU usage by about 3 %.

Only three Maven dependencies are required:

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
  <groupId>com.github.ben-manes.caffeine</groupId>
  <artifactId>caffeine</artifactId>
  <version>3.1.8</version>
</dependency>
<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>

The Spring configuration enables both caches simultaneously:

spring:
  cache:
    type: caffeine   # default to L1
    caffeine:
      spec: maximumSize=10000,expireAfterWrite=60s
  redis:
    host: 127.0.0.1
    port: 6379
    timeout: 200ms
    lettuce:
      pool:
        max-active: 64

A generic CacheTemplate<K,V> component implements the pyramid logic. The get method first checks L1, then L2, and finally falls back to a supplied DB loader, back‑filling L1 and L2 on a cache miss. set performs a dual write to L1 and L2, while evict removes entries from both layers. A scheduled task prints the L1 hit‑rate using Caffeine’s built‑in statistics.

Business code can use the cache with a single line. In ItemController,

cache.get(id, ()->itemRepository.findById(id).orElse(null))

retrieves an item, cache.set(...) stores a newly created item, and cache.evict(...) removes a deleted item. After startup, logs show hit rates such as L1 hit 0.83, L2 hit 0.15, DB hit 0.02, and the API response time drops from 28 ms to 2 ms with a 35 % CPU reduction.

Four common high‑concurrency pitfalls are discussed, including cache stampede and large keys. The author provides a randomTTL helper that adds a random offset (0‑5 min) to the base TTL, and a back‑pressure mechanism that asynchronously pre‑warms hot keys on application start using @EventListener(ApplicationReadyEvent.class) and parallelStream to control concurrency.

Load testing with wrk2 -R 5000 -d 60s -c 50 on a Mac M2 (8 GB, 4 threads) confirms the performance gains. Monitoring is integrated via Micrometer: Caffeine metrics are bound to a Prometheus registry, and Grafana alerts are set for low L1 hit‑rate, eviction spikes, and Redis keyspace hit‑rate drops.

Because Spring Cache only supports a single cache out of the box, the article shows how to create a custom @MultiCacheable annotation that lists multiple cache names (e.g., {"l1","l2"}) and lets an AOP interceptor apply the L1→L2→DB lookup order without any code intrusion.

In conclusion, the three‑level pyramid, combined with back‑pressure, random TTL, warm‑up, and observability, is essential to achieve the claimed ten‑fold API speedup.

Java Monitoring Performance Cache Redis Spring Boot Caffeine

Written by

java1234

Former senior programmer at a Fortune Global 500 company, dedicated to sharing Java expertise. Visit Feng's site: Java Knowledge Sharing, www.java1234.com

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.