Backend Development 11 min read

Boost API Performance 10× with a Three‑Tier Cache Pyramid in Spring Boot 3

This article explains how to design and implement a three‑level cache pyramid (Caffeine → Redis → MySQL) in Spring Boot 3, covering configuration, a reusable CacheTemplate, hot‑key handling, random TTL, warm‑up, monitoring, and load‑test results that show latency dropping from tens of milliseconds to a few milliseconds while cutting CPU and network usage dramatically.

Top Architect

Mar 25, 2026

Boost API Performance 10× with a Three‑Tier Cache Pyramid in Spring Boot 3

1. Introduction – Why Is It Still Slow After Adding Redis?

Typical optimization paths for reducing an interface response time from 300 ms to 30 ms involve cutting database I/O with a cache, eliminating network I/O with a local cache, and removing serialization with zero‑copy techniques. A remote Redis call adds 1–2 ms latency, which can balloon to 5–10 ms under high concurrency due to CPU context switches, serialization, and network jitter, whereas a local cache hit costs only tens of nanoseconds.

2. Three‑Tier Cache Pyramid

L1 Caffeine (local) → L2 Redis (remote) → L3 MySQL (DB)

The pyramid provides a complete solution for back‑pressure, warm‑up, hot‑key dispersion, and large‑key handling without extra dependencies; just copy and run.

Data Heat Distribution

Level   Latency   Capacity   Hit‑Rate Target   Description
L1 Caffeine   50 ns   10 MB   80%   In‑process, zero network
L2 Redis      1 ms    100 GB  15%   Horizontal scaling across clusters
L3 MySQL      10 ms+  1 TB   5%    Eventual consistency

Experience: with a single‑machine QPS of 10 k, each 1 % increase in L1 hit rate reduces CPU usage by about 3 %.

3. Environment & Dependencies (Only Three)

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-web</artifactId>
</dependency>

<dependency>
  <groupId>com.github.ben-manes.caffeine</groupId>
  <artifactId>caffeine</artifactId>
  <version>3.1.8</version>
</dependency>

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>

No extra components are required; the application can be started directly with java -jar.

4. Configuration – Enable Caffeine and Redis Simultaneously

spring:
  cache:
    type: caffeine   # default to L1
  caffeine:
    spec: maximumSize=10000,expireAfterWrite=60s
  redis:
    host: 127.0.0.1
    port: 6379
    timeout: 200ms
    lettuce:
      pool:
        max-active: 64

5. Core Wrapper – Three‑Level Cache Template

@Component
@Slf4j
public class CacheTemplate<K, V> {
    private final Cache<K, V> local = Caffeine.newBuilder()
        .maximumSize(10_000)
        .expireAfterWrite(Duration.ofSeconds(60))
        .recordStats()
        .build();

    @Autowired
    private RedisTemplate<K, V> redisTemplate;

    /** Pyramid query */
    public V get(K key, Supplier<V> dbFallback) {
        // L1 local
        V v = local.getIfPresent(key);
        if (v != null) {
            log.debug("L1 hit {}", key);
            return v;
        }
        // L2 Redis
        v = redisTemplate.opsForValue().get(key);
        if (v != null) {
            local.put(key, v); // back‑fill L1
            log.debug("L2 hit {}", key);
            return v;
        }
        // L3 DB
        v = dbFallback.get();
        if (v != null) {
            set(key, v); // double write
        }
        return v;
    }

    /** Double write (L1 + L2) */
    public void set(K key, V value) {
        local.put(key, value);
        redisTemplate.opsForValue().set(key, value, Duration.ofMinutes(5));
    }

    /** Delete (L1 + L2) */
    public void evict(K key) {
        local.invalidate(key);
        redisTemplate.delete(key);
    }

    @Scheduled(fixedDelay = 30_000)
    public void printStats() {
        log.info("L1 hitRate={}", local.stats().hitRate());
    }
}

6. Business Usage – One‑Line Cache Call

@RestController
@RequestMapping("/api/item")
@RequiredArgsConstructor
public class ItemController {
    private final CacheTemplate<Long, ItemDTO> cache;
    private final ItemRepository itemRepository;

    @GetMapping("/{id}")
    public ItemDTO getItem(@PathVariable Long id) {
        return cache.get(id, () -> itemRepository.findById(id).orElse(null));
    }

    @PostMapping
    public void create(@RequestBody ItemDTO dto) {
        ItemDTO saved = itemRepository.save(dto);
        cache.set(saved.getId(), saved);
    }

    @DeleteMapping("/{id}")
    public void delete(@PathVariable Long id) {
        itemRepository.deleteById(id);
        cache.evict(id);
    }
}

After startup, the logs show:

L1 hit 0.83
L2 hit 0.15
DB  hit 0.02

Interface response time drops from 28 ms to 2 ms, and CPU usage falls by about 35 %.

7. High‑Concurrency Pitfalls and Solutions

Cache Penetration : Concurrent requests for missing keys hammer the DB. Solution: Cache null values for a short period (e.g., 5 s).

Hot Key : A single hot key can saturate a thread. Solution: Let the local cache absorb ~80 % of traffic.

Large Key : Values of several megabytes cause network saturation. Solution: Split into hash shards or compress.

Cache Avalanche : Simultaneous expiration (e.g., after 60 s) leads to a thundering herd. Solution: Apply random TTL to both Caffeine and Redis.

Random TTL Utility

private Duration randomTTL(long baseSec) {
    long delta = ThreadLocalRandom.current().nextLong(0, 300); // 0‑5 min
    return Duration.ofSeconds(baseSec + delta);
}

8. Warm‑Up & Back‑Pressure

@EventListener(ApplicationReadyEvent.class)
public void warm() {
    List<Long> hotIds = itemRepository.findHotIds(PageRequest.of(0, 200));
    hotIds.parallelStream().forEach(id ->
        cache.set(id, itemRepository.findById(id).orElse(null)));
}

Parallel streams control concurrency using the default ForkJoinPool.commonPool().

9. Load‑Test Results

Environment: Mac M2 8 GB, 4 concurrent threads, 60 s test.

Tool:

wrk2 -R 5000 -d 60s -c 50

Metric          Pure DB   L2 Redis   L1+Caffeine   Improvement
Average RT      28 ms    5.1 ms    1.9 ms          14×
P99 RT          120 ms   18 ms     4 ms           30×
CPU usage       65%      40%       25%            ↓60%
Network out     180 MB/s 12 MB/s   0.8 MB/s       ↓99%

10. Monitoring & Alerting

Caffeine provides built‑in statistics; combine with Micrometer to expose metrics to Prometheus:

MeterBinder caffeineMetrics = registry ->
    CaffeineMetrics.monitor(registry, local, "l1_cache");

Grafana alerts: l1_cache_hit_rate < 70% → alarm. l1_cache_eviction_count spikes → capacity issue. Redis keyspace_hits / (hits+misses) < 50% → large key or penetration.

11. Extension – Multi‑Cacheable Annotation

@Target(METHOD)
@Retention(RUNTIME)
public @interface MultiCacheable {
    String[] cacheNames(); // {"l1", "l2"}
    String key();
}

An AOP interceptor processes L1 → L2 → DB in order, keeping business code untouched.

12. Conclusion

Use a pyramid model to separate data by heat.

Apply back‑pressure and random TTL to resist cache avalanche.

Warm‑up and observability make the system reliable.

When these three steps are completed, achieving a ten‑fold API speedup becomes the baseline.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Performance Optimization Backend Development Redis caching Spring Boot Caffeine

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.