Boost API Performance 10× with a Three‑Tier Cache Pyramid in Spring Boot 3
This article explains how to design and implement a three‑level cache pyramid (Caffeine → Redis → MySQL) in Spring Boot 3, covering configuration, a reusable CacheTemplate, hot‑key handling, random TTL, warm‑up, monitoring, and load‑test results that show latency dropping from tens of milliseconds to a few milliseconds while cutting CPU and network usage dramatically.
1. Introduction – Why Is It Still Slow After Adding Redis?
Typical optimization paths for reducing an interface response time from 300 ms to 30 ms involve cutting database I/O with a cache, eliminating network I/O with a local cache, and removing serialization with zero‑copy techniques. A remote Redis call adds 1–2 ms latency, which can balloon to 5–10 ms under high concurrency due to CPU context switches, serialization, and network jitter, whereas a local cache hit costs only tens of nanoseconds.
2. Three‑Tier Cache Pyramid
L1 Caffeine (local) → L2 Redis (remote) → L3 MySQL (DB)
The pyramid provides a complete solution for back‑pressure, warm‑up, hot‑key dispersion, and large‑key handling without extra dependencies; just copy and run.
Data Heat Distribution
Level Latency Capacity Hit‑Rate Target Description
L1 Caffeine 50 ns 10 MB 80% In‑process, zero network
L2 Redis 1 ms 100 GB 15% Horizontal scaling across clusters
L3 MySQL 10 ms+ 1 TB 5% Eventual consistencyExperience: with a single‑machine QPS of 10 k, each 1 % increase in L1 hit rate reduces CPU usage by about 3 %.
3. Environment & Dependencies (Only Three)
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>com.github.ben-manes.caffeine</groupId>
<artifactId>caffeine</artifactId>
<version>3.1.8</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>No extra components are required; the application can be started directly with java -jar.
4. Configuration – Enable Caffeine and Redis Simultaneously
spring:
cache:
type: caffeine # default to L1
caffeine:
spec: maximumSize=10000,expireAfterWrite=60s
redis:
host: 127.0.0.1
port: 6379
timeout: 200ms
lettuce:
pool:
max-active: 645. Core Wrapper – Three‑Level Cache Template
@Component
@Slf4j
public class CacheTemplate<K, V> {
private final Cache<K, V> local = Caffeine.newBuilder()
.maximumSize(10_000)
.expireAfterWrite(Duration.ofSeconds(60))
.recordStats()
.build();
@Autowired
private RedisTemplate<K, V> redisTemplate;
/** Pyramid query */
public V get(K key, Supplier<V> dbFallback) {
// L1 local
V v = local.getIfPresent(key);
if (v != null) {
log.debug("L1 hit {}", key);
return v;
}
// L2 Redis
v = redisTemplate.opsForValue().get(key);
if (v != null) {
local.put(key, v); // back‑fill L1
log.debug("L2 hit {}", key);
return v;
}
// L3 DB
v = dbFallback.get();
if (v != null) {
set(key, v); // double write
}
return v;
}
/** Double write (L1 + L2) */
public void set(K key, V value) {
local.put(key, value);
redisTemplate.opsForValue().set(key, value, Duration.ofMinutes(5));
}
/** Delete (L1 + L2) */
public void evict(K key) {
local.invalidate(key);
redisTemplate.delete(key);
}
@Scheduled(fixedDelay = 30_000)
public void printStats() {
log.info("L1 hitRate={}", local.stats().hitRate());
}
}6. Business Usage – One‑Line Cache Call
@RestController
@RequestMapping("/api/item")
@RequiredArgsConstructor
public class ItemController {
private final CacheTemplate<Long, ItemDTO> cache;
private final ItemRepository itemRepository;
@GetMapping("/{id}")
public ItemDTO getItem(@PathVariable Long id) {
return cache.get(id, () -> itemRepository.findById(id).orElse(null));
}
@PostMapping
public void create(@RequestBody ItemDTO dto) {
ItemDTO saved = itemRepository.save(dto);
cache.set(saved.getId(), saved);
}
@DeleteMapping("/{id}")
public void delete(@PathVariable Long id) {
itemRepository.deleteById(id);
cache.evict(id);
}
}After startup, the logs show:
L1 hit 0.83
L2 hit 0.15
DB hit 0.02Interface response time drops from 28 ms to 2 ms, and CPU usage falls by about 35 %.
7. High‑Concurrency Pitfalls and Solutions
Cache Penetration : Concurrent requests for missing keys hammer the DB. Solution: Cache null values for a short period (e.g., 5 s).
Hot Key : A single hot key can saturate a thread. Solution: Let the local cache absorb ~80 % of traffic.
Large Key : Values of several megabytes cause network saturation. Solution: Split into hash shards or compress.
Cache Avalanche : Simultaneous expiration (e.g., after 60 s) leads to a thundering herd. Solution: Apply random TTL to both Caffeine and Redis.
Random TTL Utility
private Duration randomTTL(long baseSec) {
long delta = ThreadLocalRandom.current().nextLong(0, 300); // 0‑5 min
return Duration.ofSeconds(baseSec + delta);
}8. Warm‑Up & Back‑Pressure
@EventListener(ApplicationReadyEvent.class)
public void warm() {
List<Long> hotIds = itemRepository.findHotIds(PageRequest.of(0, 200));
hotIds.parallelStream().forEach(id ->
cache.set(id, itemRepository.findById(id).orElse(null)));
}Parallel streams control concurrency using the default ForkJoinPool.commonPool().
9. Load‑Test Results
Environment: Mac M2 8 GB, 4 concurrent threads, 60 s test.
Tool:
wrk2 -R 5000 -d 60s -c 50 Metric Pure DB L2 Redis L1+Caffeine Improvement
Average RT 28 ms 5.1 ms 1.9 ms 14×
P99 RT 120 ms 18 ms 4 ms 30×
CPU usage 65% 40% 25% ↓60%
Network out 180 MB/s 12 MB/s 0.8 MB/s ↓99%10. Monitoring & Alerting
Caffeine provides built‑in statistics; combine with Micrometer to expose metrics to Prometheus:
MeterBinder caffeineMetrics = registry ->
CaffeineMetrics.monitor(registry, local, "l1_cache");Grafana alerts: l1_cache_hit_rate < 70% → alarm. l1_cache_eviction_count spikes → capacity issue. Redis keyspace_hits / (hits+misses) < 50% → large key or penetration.
11. Extension – Multi‑Cacheable Annotation
@Target(METHOD)
@Retention(RUNTIME)
public @interface MultiCacheable {
String[] cacheNames(); // {"l1", "l2"}
String key();
}An AOP interceptor processes L1 → L2 → DB in order, keeping business code untouched.
12. Conclusion
Use a pyramid model to separate data by heat.
Apply back‑pressure and random TTL to resist cache avalanche.
Warm‑up and observability make the system reliable.
When these three steps are completed, achieving a ten‑fold API speedup becomes the baseline.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
