Boost API Speed 14× with a 3‑Level Cache Pyramid in Spring Boot
By combining a local Caffeine cache, a remote Redis layer, and a MySQL database into a three‑tier cache pyramid, this guide shows how to reduce API response time from 28 ms to 2 ms, cut CPU usage by 35 %, and achieve up to 14‑fold performance gains, complete with configuration, code, and monitoring tips.
1. Introduction
Typical optimization path: DB I/O → cache → network I/O → local cache → zero‑copy serialization . Remote Redis adds 1‑2 ms latency; under high concurrency the CPU context switch, serialization and network jitter can amplify this to 5‑10 ms, while a local cache hit is only tens of nanoseconds.
2. Pyramid Model & Data Hotness Distribution
L1 Caffeine – 50 ns latency, 10 MB capacity, target hit‑rate 80 % (in‑process, zero network).
L2 Redis – 1 ms latency, 100 GB capacity, target hit‑rate 15 % (clustered, horizontally scalable).
L3 MySQL – 10 ms+ latency, TB capacity, target hit‑rate 5 % (eventually consistent).
Empirical observation: at 10 k QPS, each 1 % increase in L1 hit‑rate reduces CPU usage by ~3 %.
3. Environment & Dependencies
<!-- pom.xml -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>com.github.ben-manes.caffeine</groupId>
<artifactId>caffeine</artifactId>
<version>3.1.8</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>No extra components are required; the application can be started with java -jar.
4. Configuration: Enabling Caffeine and Redis Together
spring:
cache:
type: caffeine # default to L1
caffeine:
spec: maximumSize=10000,expireAfterWrite=60s
redis:
host: 127.0.0.1
port: 6379
timeout: 200ms
lettuce:
pool:
max-active: 645. Core Wrapper: Three‑Level Cache Template
@Component
@Slf4j
public class CacheTemplate<K, V> {
private final Cache<K, V> local = Caffeine.newBuilder()
.maximumSize(10_000)
.expireAfterWrite(Duration.ofSeconds(60))
.recordStats()
.build();
@Autowired
private RedisTemplate<K, V> redisTemplate;
/** Pyramid lookup */
public V get(K key, Supplier<V> dbFallback) {
// L1 local
V v = local.getIfPresent(key);
if (v != null) {
log.debug("L1 hit {}", key);
return v;
}
// L2 Redis
v = redisTemplate.opsForValue().get(key);
if (v != null) {
local.put(key, v); // back‑fill L1
log.debug("L2 hit {}", key);
return v;
}
// L3 DB
v = dbFallback.get();
if (v != null) {
set(key, v); // double write
}
return v;
}
/** Double write (L1 + L2) */
public void set(K key, V value) {
local.put(key, value);
redisTemplate.opsForValue().set(key, value, Duration.ofMinutes(5));
}
/** Evict (L1 + L2) */
public void evict(K key) {
local.invalidate(key);
redisTemplate.delete(key);
}
@Scheduled(fixedDelay = 30_000)
public void printStats() {
log.info("L1 hitRate={}", local.stats().hitRate());
}
}6. Business Usage: One‑Line Cache Call
@RestController
@RequestMapping("/api/item")
@RequiredArgsConstructor
public class ItemController {
private final CacheTemplate<Long, ItemDTO> cache;
private final ItemRepository itemRepository;
@GetMapping("/{id}")
public ItemDTO getItem(@PathVariable Long id) {
return cache.get(id, () -> itemRepository.findById(id).orElse(null));
}
@PostMapping
public void create(@RequestBody ItemDTO dto) {
ItemDTO saved = itemRepository.save(dto);
cache.set(saved.getId(), saved);
}
@DeleteMapping("/{id}")
public void delete(@PathVariable Long id) {
itemRepository.deleteById(id);
cache.evict(id);
}
}After startup, logs typically show L1 hitRate≈0.83, L2 hitRate≈0.15, DB hitRate≈0.02. Response time drops from 28 ms to 2 ms and CPU usage falls by ~35 %.
7. Common Pitfalls Under High Concurrency
Cache Penetration : concurrent queries for missing keys overload the DB. Solution : cache null values for a short TTL (e.g., 5 s).
Hot Key : a single hot key can saturate a thread. Solution : local cache absorbs ~80 % of traffic.
Large Key : values around 5 MB exhaust network bandwidth. Solution : split into hash shards or compress the value.
Cache Avalanche : mass expiration (e.g., every 60 s) causes a thundering herd. Solution : apply random TTL to both Caffeine and Redis.
private Duration randomTTL(long baseSec) {
long delta = ThreadLocalRandom.current().nextLong(0, 300); // 0‑5 min
return Duration.ofSeconds(baseSec + delta);
}8. Warm‑up & Back‑pressure
During application startup, asynchronously preload hot keys to avoid cold‑cache spikes:
@EventListener(ApplicationReadyEvent.class)
public void warm() {
List<Long> hotIds = itemRepository.findHotIds(PageRequest.of(0, 200));
hotIds.parallelStream()
.forEach(id -> cache.set(id, itemRepository.findById(id).orElse(null)));
}The default ForkJoinPool.commonPool() controls concurrency.
9. Benchmark Results
Environment: Mac M2 8 GB, 4 concurrent threads, 60 s test.
Tool: wrk2 -R 5000 -d 60s -c 50.
Average RT : DB 28 ms → Redis 5.1 ms → L1 1.9 ms (≈14× improvement).
P99 RT : DB 120 ms → Redis 18 ms → L1 4 ms (≈30×).
CPU usage : 65 % → 40 % → 25 % (↓ 60 %).
Network outbound : 180 MB/s → 12 MB/s → 0.8 MB/s (↓ 99 %).
10. Monitoring & Alerts
Caffeine provides built‑in statistics; integrate with Micrometer and Prometheus:
MeterBinder caffeineMetrics = registry ->
CaffeineMetrics.monitor(registry, local, "l1_cache");Alert when l1_cache_hit_rate < 70%.
Alert when l1_cache_eviction_count spikes (capacity issue).
Alert when Redis keyspace_hits / (hits+misses) < 50% (large key or penetration).
11. Extension: Multi‑Level Annotation
Spring Cache supports a single cache. Define a custom @MultiCacheable annotation to trigger L1 → L2 → DB via AOP:
@Target(METHOD)
@Retention(RUNTIME)
public @interface MultiCacheable {
String[] cacheNames(); // e.g., {"l1", "l2"}
String key();
}The interceptor checks L1, then L2, then falls back to the DB, keeping business code untouched.
12. Conclusion
Layered pyramid model separates data hotness.
Back‑pressure and random TTL prevent cache avalanche.
Warm‑up and observability ensure reliable operation.
When these practices are applied, API latency can improve tenfold or more, providing a solid performance foundation for high‑traffic services.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Architect Handbook
Focused on Java interview questions and practical article sharing, covering algorithms, databases, Spring Boot, microservices, high concurrency, JVM, Docker containers, and ELK-related knowledge. Looking forward to progressing together with you.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
