Backend Development 10 min read

Achieve 10× Faster APIs with Spring Boot 3’s Three‑Level Cache Pyramid

The article demonstrates how to combine Spring Boot 3, Caffeine local cache, and Redis into a three‑level cache pyramid, reducing API response time from 28 ms to 2 ms, cutting CPU usage by 35 %, and providing detailed configuration, code examples, performance benchmarks, and mitigation strategies for common high‑concurrency pitfalls.

Java Companion

Jan 4, 2026

Achieve 10× Faster APIs with Spring Boot 3’s Three‑Level Cache Pyramid

Why Redis may still be slow

Typical optimization path "API RT 300 ms → 30 ms" first removes database I/O with a cache, then removes network I/O with a local cache, and finally eliminates serialization with zero‑copy. A remote Redis round‑trip of 1‑2 ms can expand to 5‑10 ms under high concurrency because of CPU context switches, serialization, and network jitter, while a local cache hit costs only tens of nanoseconds.

Three‑Level Pyramid Model & Data Hotness Distribution

L1 Caffeine (local) → L2 Redis (remote) → L3 MySQL (DB)

L1 Caffeine : latency 50 ns, capacity 10 MB, hit‑rate target 80 %, in‑process, zero network.

L2 Redis : latency 1 ms, capacity 100 GB, hit‑rate target 15 %, cluster horizontal scaling.

L3 MySQL : latency 10 ms+, capacity TB, hit‑rate target 5 %, eventual consistency.

Experience: at 10 k QPS, each 1 % increase in L1 hit rate reduces CPU usage by about 3 %.

Environment & Dependencies (only three)

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
  <groupId>com.github.ben-manes.caffeine</groupId>
  <artifactId>caffeine</artifactId>
  <version>3.1.8</version>
</dependency>
<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>

No extra components are required; the application can be started with java -jar.

Configuration: Enabling Caffeine and Redis together

spring:
  cache:
    type: caffeine # default to L1
    caffeine:
      spec: maximumSize=10000,expireAfterWrite=60s
    redis:
      host: 127.0.0.1
      port: 6379
      timeout: 200ms
      lettuce:
        pool:
          max-active: 64

Core Wrapper: Three‑Level Cache Template

@Component
@Slf4j
public class CacheTemplate<K, V> {
    private final Cache<K, V> local = Caffeine.newBuilder()
            .maximumSize(10_000)
            .expireAfterWrite(Duration.ofSeconds(60))
            .recordStats()
            .build();

    @Autowired
    private RedisTemplate<K, V> redisTemplate;

    /** Pyramid lookup */
    public V get(K key, Supplier<V> dbFallback) {
        // L1 local
        V v = local.getIfPresent(key);
        if (v != null) {
            log.debug("L1 hit {}", key);
            return v;
        }
        // L2 Redis
        v = redisTemplate.opsForValue().get(key);
        if (v != null) {
            local.put(key, v); // back‑fill L1
            log.debug("L2 hit {}", key);
            return v;
        }
        // L3 DB
        v = dbFallback.get();
        if (v != null) {
            set(key, v); // double write
        }
        return v;
    }

    /** Double write (L1 + L2) */
    public void set(K key, V value) {
        local.put(key, value);
        redisTemplate.opsForValue().set(key, value, Duration.ofMinutes(5));
    }

    /** Delete (L1 + L2) */
    public void evict(K key) {
        local.invalidate(key);
        redisTemplate.delete(key);
    }

    @Scheduled(fixedDelay = 30_000)
    public void printStats() {
        log.info("L1 hitRate={}", local.stats().hitRate());
    }
}

Business Usage: One‑Line Cache Calls

@RestController
@RequestMapping("/api/item")
@RequiredArgsConstructor
public class ItemController {
    private final CacheTemplate<Long, ItemDTO> cache;
    private final ItemRepository itemRepository;

    @GetMapping("/{id}")
    public ItemDTO getItem(@PathVariable Long id) {
        return cache.get(id, () -> itemRepository.findById(id).orElse(null));
    }

    @PostMapping
    public void create(@RequestBody ItemDTO dto) {
        ItemDTO saved = itemRepository.save(dto);
        cache.set(saved.getId(), saved);
    }

    @DeleteMapping("/{id}")
    public void delete(@PathVariable Long id) {
        itemRepository.deleteById(id);
        cache.evict(id);
    }
}

Observations

L1 hit 0.83
L2 hit 0.15
DB  hit 0.02

API response time drops from 28 ms to 2 ms, and CPU usage falls by 35 %.

Common Pitfalls under High Concurrency

Cache penetration : concurrent queries for missing keys overload DB. Solution : cache null values for 5 seconds in get().

Hot key : a single hot key saturates a thread. Solution : local cache absorbs ~80 % of traffic.

Large key : a 5 MB value exhausts network bandwidth. Solution : split into hash shards or compress.

Cache avalanche : mass expiration after 60 s causes thundering herd. Solution : apply random TTL to both Caffeine and Redis.

Random TTL Utility

private Duration randomTTL(long baseSec) {
    long delta = ThreadLocalRandom.current().nextLong(0, 300); // 0‑5 min
    return Duration.ofSeconds(baseSec + delta);
}

Local Warm‑up & Back‑pressure

Asynchronously warm hot keys at startup to avoid cold‑cache penetration:

@EventListener(ApplicationReadyEvent.class)
public void warm() {
    List<Long> hotIds = itemRepository.findHotIds(PageRequest.of(0, 200));
    hotIds.parallelStream().forEach(id ->
        cache.set(id, itemRepository.findById(id).orElse(null)));
}

ParallelStream uses the default ForkJoinPool.commonPool() for concurrency control.

Benchmark Results

Environment : Mac M2 8 GB, 4 concurrent threads, 60 s.

Tool : wrk2 -R 5000 -d 60s -c 50.

Average RT : DB 28 ms, L2 5.1 ms, L1+Caffeine 1.9 ms (14× improvement).

P99 RT : DB 120 ms, L2 18 ms, L1+Caffeine 4 ms (30× improvement).

CPU usage : DB 65 %, L2 40 %, L1+Caffeine 25 % (↓ 60 %).

Network outflow : DB 180 MB/s, L2 12 MB/s, L1+Caffeine 0.8 MB/s (↓ 99 %).

Monitoring & Alerting

MeterBinder caffeineMetrics = registry ->
        CaffeineMetrics.monitor(registry, local, "l1_cache");

l1_cache_hit_rate < 70%

→ alert. l1_cache_eviction_count spikes → capacity issue. Redis keyspace_hits / (hits+misses) < 50% → large key or penetration.

Extension: Multi‑Cacheable Annotation

@Target(METHOD)
@Retention(RUNTIME)
public @interface MultiCacheable {
    String[] cacheNames(); // {"l1", "l2"}
    String key();
}

An AOP interceptor processes caches in order L1 → L2 → DB, keeping business code untouched.

Conclusion

The pyramid model partitions data by hotness, allowing L1, L2, and L3 layers to serve appropriate traffic.

Back‑pressure and random TTL protect against cache avalanche.

Warm‑up and comprehensive monitoring make the system observable.

When all three practices are applied, a ten‑fold API speedup is the baseline.

Java Performance Cache Redis Spring Boot Caffeine

Written by

Java Companion

A highly professional Java public account

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.