Achieve 10× Faster APIs with Spring Boot 3’s Three‑Level Cache Pyramid
The article demonstrates how to combine Spring Boot 3, Caffeine local cache, and Redis into a three‑level cache pyramid, reducing API response time from 28 ms to 2 ms, cutting CPU usage by 35 %, and providing detailed configuration, code examples, performance benchmarks, and mitigation strategies for common high‑concurrency pitfalls.
Why Redis may still be slow
Typical optimization path "API RT 300 ms → 30 ms" first removes database I/O with a cache, then removes network I/O with a local cache, and finally eliminates serialization with zero‑copy. A remote Redis round‑trip of 1‑2 ms can expand to 5‑10 ms under high concurrency because of CPU context switches, serialization, and network jitter, while a local cache hit costs only tens of nanoseconds.
Three‑Level Pyramid Model & Data Hotness Distribution
L1 Caffeine (local) → L2 Redis (remote) → L3 MySQL (DB)
L1 Caffeine : latency 50 ns, capacity 10 MB, hit‑rate target 80 %, in‑process, zero network.
L2 Redis : latency 1 ms, capacity 100 GB, hit‑rate target 15 %, cluster horizontal scaling.
L3 MySQL : latency 10 ms+, capacity TB, hit‑rate target 5 %, eventual consistency.
Experience: at 10 k QPS, each 1 % increase in L1 hit rate reduces CPU usage by about 3 %.
Environment & Dependencies (only three)
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>com.github.ben-manes.caffeine</groupId>
<artifactId>caffeine</artifactId>
<version>3.1.8</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>No extra components are required; the application can be started with java -jar.
Configuration: Enabling Caffeine and Redis together
spring:
cache:
type: caffeine # default to L1
caffeine:
spec: maximumSize=10000,expireAfterWrite=60s
redis:
host: 127.0.0.1
port: 6379
timeout: 200ms
lettuce:
pool:
max-active: 64Core Wrapper: Three‑Level Cache Template
@Component
@Slf4j
public class CacheTemplate<K, V> {
private final Cache<K, V> local = Caffeine.newBuilder()
.maximumSize(10_000)
.expireAfterWrite(Duration.ofSeconds(60))
.recordStats()
.build();
@Autowired
private RedisTemplate<K, V> redisTemplate;
/** Pyramid lookup */
public V get(K key, Supplier<V> dbFallback) {
// L1 local
V v = local.getIfPresent(key);
if (v != null) {
log.debug("L1 hit {}", key);
return v;
}
// L2 Redis
v = redisTemplate.opsForValue().get(key);
if (v != null) {
local.put(key, v); // back‑fill L1
log.debug("L2 hit {}", key);
return v;
}
// L3 DB
v = dbFallback.get();
if (v != null) {
set(key, v); // double write
}
return v;
}
/** Double write (L1 + L2) */
public void set(K key, V value) {
local.put(key, value);
redisTemplate.opsForValue().set(key, value, Duration.ofMinutes(5));
}
/** Delete (L1 + L2) */
public void evict(K key) {
local.invalidate(key);
redisTemplate.delete(key);
}
@Scheduled(fixedDelay = 30_000)
public void printStats() {
log.info("L1 hitRate={}", local.stats().hitRate());
}
}Business Usage: One‑Line Cache Calls
@RestController
@RequestMapping("/api/item")
@RequiredArgsConstructor
public class ItemController {
private final CacheTemplate<Long, ItemDTO> cache;
private final ItemRepository itemRepository;
@GetMapping("/{id}")
public ItemDTO getItem(@PathVariable Long id) {
return cache.get(id, () -> itemRepository.findById(id).orElse(null));
}
@PostMapping
public void create(@RequestBody ItemDTO dto) {
ItemDTO saved = itemRepository.save(dto);
cache.set(saved.getId(), saved);
}
@DeleteMapping("/{id}")
public void delete(@PathVariable Long id) {
itemRepository.deleteById(id);
cache.evict(id);
}
}Observations
L1 hit 0.83
L2 hit 0.15
DB hit 0.02API response time drops from 28 ms to 2 ms, and CPU usage falls by 35 %.
Common Pitfalls under High Concurrency
Cache penetration : concurrent queries for missing keys overload DB. Solution : cache null values for 5 seconds in get().
Hot key : a single hot key saturates a thread. Solution : local cache absorbs ~80 % of traffic.
Large key : a 5 MB value exhausts network bandwidth. Solution : split into hash shards or compress.
Cache avalanche : mass expiration after 60 s causes thundering herd. Solution : apply random TTL to both Caffeine and Redis.
Random TTL Utility
private Duration randomTTL(long baseSec) {
long delta = ThreadLocalRandom.current().nextLong(0, 300); // 0‑5 min
return Duration.ofSeconds(baseSec + delta);
}Local Warm‑up & Back‑pressure
Asynchronously warm hot keys at startup to avoid cold‑cache penetration:
@EventListener(ApplicationReadyEvent.class)
public void warm() {
List<Long> hotIds = itemRepository.findHotIds(PageRequest.of(0, 200));
hotIds.parallelStream().forEach(id ->
cache.set(id, itemRepository.findById(id).orElse(null)));
}ParallelStream uses the default ForkJoinPool.commonPool() for concurrency control.
Benchmark Results
Environment : Mac M2 8 GB, 4 concurrent threads, 60 s.
Tool : wrk2 -R 5000 -d 60s -c 50.
Average RT : DB 28 ms, L2 5.1 ms, L1+Caffeine 1.9 ms (14× improvement).
P99 RT : DB 120 ms, L2 18 ms, L1+Caffeine 4 ms (30× improvement).
CPU usage : DB 65 %, L2 40 %, L1+Caffeine 25 % (↓ 60 %).
Network outflow : DB 180 MB/s, L2 12 MB/s, L1+Caffeine 0.8 MB/s (↓ 99 %).
Monitoring & Alerting
MeterBinder caffeineMetrics = registry ->
CaffeineMetrics.monitor(registry, local, "l1_cache"); l1_cache_hit_rate < 70%→ alert. l1_cache_eviction_count spikes → capacity issue. Redis keyspace_hits / (hits+misses) < 50% → large key or penetration.
Extension: Multi‑Cacheable Annotation
@Target(METHOD)
@Retention(RUNTIME)
public @interface MultiCacheable {
String[] cacheNames(); // {"l1", "l2"}
String key();
}An AOP interceptor processes caches in order L1 → L2 → DB, keeping business code untouched.
Conclusion
The pyramid model partitions data by hotness, allowing L1, L2, and L3 layers to serve appropriate traffic.
Back‑pressure and random TTL protect against cache avalanche.
Warm‑up and comprehensive monitoring make the system observable.
When all three practices are applied, a ten‑fold API speedup is the baseline.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
