Backend Development 9 min read

Boost API Speed 10× with a Three‑Level Cache Pyramid in Spring Boot 3

This article explains why adding Redis alone may still be slow, introduces a three‑level cache pyramid (Caffeine L1, Redis L2, DB L3) built with Spring Boot 3, and provides complete configuration, code, warm‑up, monitoring, and benchmark results that reduce response time from 28 ms to 2 ms while cutting CPU usage by 35%.

Architect's Guide

Mar 19, 2026

Boost API Speed 10× with a Three‑Level Cache Pyramid in Spring Boot 3

1. Why Redis alone can still be slow

Typical optimization steps are to cut database I/O with a cache, cut network I/O with a local cache, and cut serialization with zero‑copy. A remote Redis round‑trip costs 1‑2 ms, but under high concurrency the CPU context switch, serialization, and network jitter can amplify this to 5‑10 ms, whereas a local cache hit takes only tens of nanoseconds.

2. Three‑level cache pyramid

Using Spring Boot 3 we build a pyramid: L1 Caffeine (local) → L2 Redis (remote) → L3 Database . The solution includes back‑pressure, warm‑up, hot‑key handling, large‑key sharding, and requires no extra components – just copy‑paste and run.

3. Data‑heat distribution

In a single‑machine scenario with 10 k QPS, each 1 % increase in L1 hit rate reduces CPU usage by about 3 %.

4. Environment & dependencies (only three)

<!-- pom.xml -->
<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
  <groupId>com.github.ben-manes.caffeine</groupId>
  <artifactId>caffeine</artifactId>
  <version>3.1.8</version>
</dependency>
<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>

Run the application directly with java -jar without additional components.

5. Configuration: enable Caffeine and Redis together

spring:
  cache:
    type: caffeine # default uses L1
    caffeine:
      spec: maximumSize=10000,expireAfterWrite=60s
  redis:
    host: 127.0.0.1
    port: 6379
    timeout: 200ms
    lettuce:
      pool:
        max-active: 64

6. Core encapsulation – three‑level cache template

@Component
@Slf4j
public class CacheTemplate<K, V> {
    private final Cache<K, V> local = Caffeine.newBuilder()
            .maximumSize(10_000)
            .expireAfterWrite(Duration.ofSeconds(60))
            .recordStats()
            .build();

    @Autowired
    private RedisTemplate<K, V> redisTemplate;

    /**
     * Pyramid lookup
     */
    public V get(K key, Supplier<V> dbFallback) {
        // L1 local
        V v = local.getIfPresent(key);
        if (v != null) {
            log.debug("L1 hit {}", key);
            return v;
        }
        // L2 Redis
        v = redisTemplate.opsForValue().get(key);
        if (v != null) {
            local.put(key, v); // back‑fill L1
            log.debug("L2 hit {}", key);
            return v;
        }
        // L3 DB
        v = dbFallback.get();
        if (v != null) {
            set(key, v); // write‑through
        }
        return v;
    }

    /**
     * Write‑through (L1 + L2)
     */
    public void set(K key, V value) {
        local.put(key, value);
        redisTemplate.opsForValue().set(key, value, Duration.ofMinutes(5));
    }

    /**
     * Delete (L1 + L2)
     */
    public void evict(K key) {
        local.invalidate(key);
        redisTemplate.delete(key);
    }

    @Scheduled(fixedDelay = 30_000)
    public void printStats() {
        log.info("L1 hitRate={}", local.stats().hitRate());
    }
}

7. Business usage – one‑line cache call

@RestController
@RequestMapping("/api/item")
@RequiredArgsConstructor
public class ItemController {
    private final CacheTemplate<Long, ItemDTO> cache;
    private final ItemRepository itemRepository;

    @GetMapping("/{id}")
    public ItemDTO getItem(@PathVariable Long id) {
        return cache.get(id, () -> itemRepository.findById(id).orElse(null));
    }

    @PostMapping
    public void create(@RequestBody ItemDTO dto) {
        ItemDTO saved = itemRepository.save(dto);
        cache.set(saved.getId(), saved);
    }

    @DeleteMapping("/{id}")
    public void delete(@PathVariable Long id) {
        itemRepository.deleteById(id);
        cache.evict(id);
    }
}

8. Observed results

Log output after startup:

L1 hit 0.83
L2 hit 0.15
DB  hit 0.02

Response time dropped from 28 ms to 2 ms , and CPU usage decreased by about 35 % .

9. High‑concurrency pitfalls (four common issues)

The article lists four typical problems that appear under high load, such as cache stampede, large keys, TTL expiration spikes, and insufficient capacity, and provides mitigation strategies.

10. Local warm‑up & back‑pressure

At application start, hot keys are pre‑loaded asynchronously to avoid cold‑cache penetration:

@EventListener(ApplicationReadyEvent.class)
public void warm() {
    List<Long> hotIds = itemRepository.findHotIds(PageRequest.of(0, 200));
    hotIds.parallelStream().forEach(id ->
        cache.set(id, itemRepository.findById(id).orElse(null)));
}

Parallel streams control concurrency using the default ForkJoinPool.commonPool().

11. Benchmark results

Environment: Mac M2 8 GB, 4 concurrent threads, 60 s

Tool: wrk2 -R 5000 -d 60s -c 50 Benchmark image:

12. Monitoring & alerting

Caffeine provides built‑in statistics; Micrometer exports them to Prometheus:

MeterBinder caffeineMetrics = registry ->
    CaffeineMetrics.monitor(registry, local, "l1_cache");

Grafana panels to watch: l1_cache_hit_rate < 70% → alert l1_cache_eviction_count spikes → capacity issue Redis keyspace_hits / (hits+misses) < 50% → large key or cache penetration

13. Extension – multi‑cache annotation

Spring Cache natively supports a single cache; a custom MultiCacheable annotation enables layered caching:

@Target(METHOD)
@Retention(RUNTIME)
public @interface MultiCacheable {
    String[] cacheNames(); // {"l1", "l2"}
    String key();
}

An AOP interceptor processes caches in order L1 → L2 → DB, keeping business code untouched.

14. Conclusion

The pyramid model separates data by heat, placing hot data in L1, warm data in L2, and cold data in the DB.

Back‑pressure and random TTL prevent cache avalanche.

Warm‑up and comprehensive monitoring make the system observable.

When all three steps are applied, API latency can improve tenfold.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Performance cache Redis Spring Boot Caffeine

Written by

Architect's Guide

Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.