Boost API Latency 10× with Spring Boot 3 and a Local Cache Pyramid
The article demonstrates how to achieve a ten‑fold reduction in API response time by building a three‑level cache pyramid (Caffeine L1, Redis L2, DB L3) in Spring Boot 3, covering dependencies, configuration, core template code, warm‑up, monitoring, load‑test results and common high‑concurrency pitfalls.
Even when a remote Redis cache is added, latency can remain high because each request still incurs network round‑trip, CPU context switches and serialization overhead. The author outlines a typical optimization path: cut database I/O with caching, cut network I/O with a local cache, and eliminate serialization with zero‑copy. A remote Redis call of 1‑2 ms can balloon to 5‑10 ms under high concurrency, while a local cache hit costs only tens of nanoseconds.
The solution is a "three‑level pyramid" built with Spring Boot 3: L1 Caffeine (in‑process), L2 Redis (remote), and L3 the underlying database. This model aligns with data‑hotness distribution; the author notes that at 10 k QPS, each 1 % increase in L1 hit rate reduces CPU usage by about 3 %.
Only three Maven dependencies are required:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>com.github.ben-manes.caffeine</groupId>
<artifactId>caffeine</artifactId>
<version>3.1.8</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>The Spring configuration enables both caches simultaneously:
spring:
cache:
type: caffeine # default to L1
caffeine:
spec: maximumSize=10000,expireAfterWrite=60s
redis:
host: 127.0.0.1
port: 6379
timeout: 200ms
lettuce:
pool:
max-active: 64A generic CacheTemplate<K,V> component implements the pyramid logic. The get method first checks L1, then L2, and finally falls back to a supplied DB loader, back‑filling L1 and L2 on a cache miss. set performs a dual write to L1 and L2, while evict removes entries from both layers. A scheduled task prints the L1 hit‑rate using Caffeine’s built‑in statistics.
Business code can use the cache with a single line. In ItemController,
cache.get(id, ()->itemRepository.findById(id).orElse(null))retrieves an item, cache.set(...) stores a newly created item, and cache.evict(...) removes a deleted item. After startup, logs show hit rates such as L1 hit 0.83, L2 hit 0.15, DB hit 0.02, and the API response time drops from 28 ms to 2 ms with a 35 % CPU reduction.
Four common high‑concurrency pitfalls are discussed, including cache stampede and large keys. The author provides a randomTTL helper that adds a random offset (0‑5 min) to the base TTL, and a back‑pressure mechanism that asynchronously pre‑warms hot keys on application start using @EventListener(ApplicationReadyEvent.class) and parallelStream to control concurrency.
Load testing with wrk2 -R 5000 -d 60s -c 50 on a Mac M2 (8 GB, 4 threads) confirms the performance gains. Monitoring is integrated via Micrometer: Caffeine metrics are bound to a Prometheus registry, and Grafana alerts are set for low L1 hit‑rate, eviction spikes, and Redis keyspace hit‑rate drops.
Because Spring Cache only supports a single cache out of the box, the article shows how to create a custom @MultiCacheable annotation that lists multiple cache names (e.g., {"l1","l2"}) and lets an AOP interceptor apply the L1→L2→DB lookup order without any code intrusion.
In conclusion, the three‑level pyramid, combined with back‑pressure, random TTL, warm‑up, and observability, is essential to achieve the claimed ten‑fold API speedup.
java1234
Former senior programmer at a Fortune Global 500 company, dedicated to sharing Java expertise. Visit Feng's site: Java Knowledge Sharing, www.java1234.com
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
