Backend Development 12 min read

Boost API Speed 14× with a 3‑Level Cache Pyramid in Spring Boot

By combining a local Caffeine cache, a remote Redis layer, and a MySQL database into a three‑tier cache pyramid, this guide shows how to reduce API response time from 28 ms to 2 ms, cut CPU usage by 35 %, and achieve up to 14‑fold performance gains, complete with configuration, code, and monitoring tips.

Java Architect Handbook

Jan 10, 2026

Boost API Speed 14× with a 3‑Level Cache Pyramid in Spring Boot

1. Introduction

Typical optimization path: DB I/O → cache → network I/O → local cache → zero‑copy serialization . Remote Redis adds 1‑2 ms latency; under high concurrency the CPU context switch, serialization and network jitter can amplify this to 5‑10 ms, while a local cache hit is only tens of nanoseconds.

2. Pyramid Model & Data Hotness Distribution

L1 Caffeine – 50 ns latency, 10 MB capacity, target hit‑rate 80 % (in‑process, zero network).

L2 Redis – 1 ms latency, 100 GB capacity, target hit‑rate 15 % (clustered, horizontally scalable).

L3 MySQL – 10 ms+ latency, TB capacity, target hit‑rate 5 % (eventually consistent).

Empirical observation: at 10 k QPS, each 1 % increase in L1 hit‑rate reduces CPU usage by ~3 %.

3. Environment & Dependencies

<!-- pom.xml -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
    <groupId>com.github.ben-manes.caffeine</groupId>
    <artifactId>caffeine</artifactId>
    <version>3.1.8</version>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>

No extra components are required; the application can be started with java -jar.

4. Configuration: Enabling Caffeine and Redis Together

spring:
  cache:
    type: caffeine   # default to L1
    caffeine:
      spec: maximumSize=10000,expireAfterWrite=60s
  redis:
    host: 127.0.0.1
    port: 6379
    timeout: 200ms
    lettuce:
      pool:
        max-active: 64

5. Core Wrapper: Three‑Level Cache Template

@Component
@Slf4j
public class CacheTemplate<K, V> {
    private final Cache<K, V> local = Caffeine.newBuilder()
            .maximumSize(10_000)
            .expireAfterWrite(Duration.ofSeconds(60))
            .recordStats()
            .build();

    @Autowired
    private RedisTemplate<K, V> redisTemplate;

    /** Pyramid lookup */
    public V get(K key, Supplier<V> dbFallback) {
        // L1 local
        V v = local.getIfPresent(key);
        if (v != null) {
            log.debug("L1 hit {}", key);
            return v;
        }
        // L2 Redis
        v = redisTemplate.opsForValue().get(key);
        if (v != null) {
            local.put(key, v); // back‑fill L1
            log.debug("L2 hit {}", key);
            return v;
        }
        // L3 DB
        v = dbFallback.get();
        if (v != null) {
            set(key, v); // double write
        }
        return v;
    }

    /** Double write (L1 + L2) */
    public void set(K key, V value) {
        local.put(key, value);
        redisTemplate.opsForValue().set(key, value, Duration.ofMinutes(5));
    }

    /** Evict (L1 + L2) */
    public void evict(K key) {
        local.invalidate(key);
        redisTemplate.delete(key);
    }

    @Scheduled(fixedDelay = 30_000)
    public void printStats() {
        log.info("L1 hitRate={}", local.stats().hitRate());
    }
}

6. Business Usage: One‑Line Cache Call

@RestController
@RequestMapping("/api/item")
@RequiredArgsConstructor
public class ItemController {
    private final CacheTemplate<Long, ItemDTO> cache;
    private final ItemRepository itemRepository;

    @GetMapping("/{id}")
    public ItemDTO getItem(@PathVariable Long id) {
        return cache.get(id, () -> itemRepository.findById(id).orElse(null));
    }

    @PostMapping
    public void create(@RequestBody ItemDTO dto) {
        ItemDTO saved = itemRepository.save(dto);
        cache.set(saved.getId(), saved);
    }

    @DeleteMapping("/{id}")
    public void delete(@PathVariable Long id) {
        itemRepository.deleteById(id);
        cache.evict(id);
    }
}

After startup, logs typically show L1 hitRate≈0.83, L2 hitRate≈0.15, DB hitRate≈0.02. Response time drops from 28 ms to 2 ms and CPU usage falls by ~35 %.

7. Common Pitfalls Under High Concurrency

Cache Penetration : concurrent queries for missing keys overload the DB. Solution : cache null values for a short TTL (e.g., 5 s).

Hot Key : a single hot key can saturate a thread. Solution : local cache absorbs ~80 % of traffic.

Large Key : values around 5 MB exhaust network bandwidth. Solution : split into hash shards or compress the value.

Cache Avalanche : mass expiration (e.g., every 60 s) causes a thundering herd. Solution : apply random TTL to both Caffeine and Redis.

private Duration randomTTL(long baseSec) {
    long delta = ThreadLocalRandom.current().nextLong(0, 300); // 0‑5 min
    return Duration.ofSeconds(baseSec + delta);
}

8. Warm‑up & Back‑pressure

During application startup, asynchronously preload hot keys to avoid cold‑cache spikes:

@EventListener(ApplicationReadyEvent.class)
public void warm() {
    List<Long> hotIds = itemRepository.findHotIds(PageRequest.of(0, 200));
    hotIds.parallelStream()
          .forEach(id -> cache.set(id, itemRepository.findById(id).orElse(null)));
}

The default ForkJoinPool.commonPool() controls concurrency.

9. Benchmark Results

Environment: Mac M2 8 GB, 4 concurrent threads, 60 s test.

Tool: wrk2 -R 5000 -d 60s -c 50.

Average RT : DB 28 ms → Redis 5.1 ms → L1 1.9 ms (≈14× improvement).

P99 RT : DB 120 ms → Redis 18 ms → L1 4 ms (≈30×).

CPU usage : 65 % → 40 % → 25 % (↓ 60 %).

Network outbound : 180 MB/s → 12 MB/s → 0.8 MB/s (↓ 99 %).

10. Monitoring & Alerts

Caffeine provides built‑in statistics; integrate with Micrometer and Prometheus:

MeterBinder caffeineMetrics = registry ->
    CaffeineMetrics.monitor(registry, local, "l1_cache");

Alert when l1_cache_hit_rate < 70%.

Alert when l1_cache_eviction_count spikes (capacity issue).

Alert when Redis keyspace_hits / (hits+misses) < 50% (large key or penetration).

11. Extension: Multi‑Level Annotation

Spring Cache supports a single cache. Define a custom @MultiCacheable annotation to trigger L1 → L2 → DB via AOP:

@Target(METHOD)
@Retention(RUNTIME)
public @interface MultiCacheable {
    String[] cacheNames(); // e.g., {"l1", "l2"}
    String key();
}

The interceptor checks L1, then L2, then falls back to the DB, keeping business code untouched.

12. Conclusion

Layered pyramid model separates data hotness.

Back‑pressure and random TTL prevent cache avalanche.

Warm‑up and observability ensure reliable operation.

When these practices are applied, API latency can improve tenfold or more, providing a solid performance foundation for high‑traffic services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization cache Redis Spring Boot Caffeine

Written by

Java Architect Handbook

Focused on Java interview questions and practical article sharing, covering algorithms, databases, Spring Boot, microservices, high concurrency, JVM, Docker containers, and ELK-related knowledge. Looking forward to progressing together with you.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.