Backend Development 10 min read

Boost API Latency 10× with Spring Boot 3 and a Three‑Level Local Cache Pyramid

The article explains why adding Redis alone often remains slow, introduces a three‑level cache pyramid (L1 Caffeine, L2 Redis, L3 MySQL) built with Spring Boot 3, and shows how this design reduces request latency from 28 ms to 2 ms, cuts CPU usage by 35 % and achieves up to 14‑fold throughput improvement.

java1234

Jan 6, 2026

Boost API Latency 10× with Spring Boot 3 and a Three‑Level Local Cache Pyramid

Even after introducing Redis, high‑concurrency workloads can still suffer because each remote cache hit adds 1–2 ms of network latency, plus CPU context switches and serialization overhead, which may grow to 5–10 ms; a local in‑process cache can respond in tens of nanoseconds.

The author builds a "three‑level pyramid" using Spring Boot 3: L1 is an in‑process Caffeine cache (≈50 ns latency, 10 MB capacity, 80 % hit‑rate target), L2 is a remote Redis cache (≈1 ms latency, 100 GB capacity, 15 % hit‑rate target), and L3 is the MySQL database (≥10 ms latency, TB capacity, 5 % hit‑rate target). In a single‑node test with 10 k QPS, each 1 % increase in L1 hit‑rate reduces CPU consumption by about 3 %.

Only three Maven dependencies are required:

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
  <groupId>com.github.ben-manes.caffeine</groupId>
  <artifactId>caffeine</artifactId>
  <version>3.1.8</version>
</dependency>
<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>

The Spring configuration enables both caches simultaneously:

spring:
  cache:
    type: caffeine   # default to L1
    caffeine:
      spec: maximumSize=10000,expireAfterWrite=60s
  redis:
    host: 127.0.0.1
    port: 6379
    timeout: 200ms
    lettuce:
      pool:
        max-active: 64

A reusable CacheTemplate<K,V> component implements the pyramid lookup logic: first try L1, then L2, and finally fall back to the database supplier. Successful L2 hits are written back to L1, and DB results are written to both caches (dual‑write). A scheduled task logs the L1 hit‑rate.

@Component
@Slf4j
public class CacheTemplate<K, V> {
    private final Cache<K, V> local = Caffeine.newBuilder()
        .maximumSize(10_000)
        .expireAfterWrite(Duration.ofSeconds(60))
        .recordStats()
        .build();

    @Autowired
    private RedisTemplate<K, V> redisTemplate;

    public V get(K key, Supplier<V> dbFallback) {
        V v = local.getIfPresent(key);
        if (v != null) {
            log.debug("L1 hit {}", key);
            return v;
        }
        v = redisTemplate.opsForValue().get(key);
        if (v != null) {
            local.put(key, v);
            log.debug("L2 hit {}", key);
            return v;
        }
        v = dbFallback.get();
        if (v != null) {
            set(key, v);
        }
        return v;
    }

    public void set(K key, V value) {
        local.put(key, value);
        redisTemplate.opsForValue().set(key, value, Duration.ofMinutes(5));
    }

    public void evict(K key) {
        local.invalidate(key);
        redisTemplate.delete(key);
    }

    @Scheduled(fixedDelay = 30_000)
    public void printStats() {
        log.info("L1 hitRate={}", local.stats().hitRate());
    }
}

Using the template in a REST controller requires only a single line per operation:

@RestController
@RequestMapping("/api/item")
@RequiredArgsConstructor
public class ItemController {
    private final CacheTemplate<Long, ItemDTO> cache;
    private final ItemRepository itemRepository;

    @GetMapping("/{id}")
    public ItemDTO getItem(@PathVariable Long id) {
        return cache.get(id, () -> itemRepository.findById(id).orElse(null));
    }

    @PostMapping
    public void create(@RequestBody ItemDTO dto) {
        ItemDTO saved = itemRepository.save(dto);
        cache.set(saved.getId(), saved);
    }

    @DeleteMapping("/{id}")
    public void delete(@PathVariable Long id) {
        itemRepository.deleteById(id);
        cache.evict(id);
    }
}

After startup, the logs typically show L1 hit‑rate ≈ 0.83, L2 ≈ 0.15, DB ≈ 0.02. Under a wrk2 load (4 threads, 60 s, 5 000 RPS) the measured metrics are:

Average RT: 28 ms (DB only) → 1.9 ms (L1+Caffeine) → 14× improvement

P99 RT: 120 ms → 4 ms → 30× improvement

CPU usage: 65 % → 25 % (≈ 60 % reduction)

Network outbound: 180 MB/s → 0.8 MB/s (≈ 99 % reduction)

Four common high‑concurrency pitfalls are addressed:

Cache penetration : cache null values for missing keys for 5 s.

Hot key : let L1 absorb 80 % of traffic.

Large value : split into hash shards or compress.

Cache avalanche : apply random TTL to both Caffeine and Redis entries.

private Duration randomTTL(long baseSec) {
    long delta = ThreadLocalRandom.current().nextLong(0, 300); // 0‑5 min
    return Duration.ofSeconds(baseSec + delta);
}

To avoid cold‑cache spikes, the application pre‑warms hot keys on startup and uses back‑pressure to limit concurrency:

@EventListener(ApplicationReadyEvent.class)
public void warm() {
    List<Long> hotIds = itemRepository.findHotIds(PageRequest.of(0, 200));
    hotIds.parallelStream().forEach(id ->
        cache.set(id, itemRepository.findById(id).orElse(null)));
}

Monitoring is integrated via Micrometer: Caffeine metrics are bound to a MeterBinder and exported to Prometheus. Example alerts: l1_cache_hit_rate < 70% → alarm l1_cache_eviction_count spikes → capacity issue Redis keyspace_hits / (hits+misses) < 50% → large key or penetration

Because Spring’s native @Cacheable supports only a single cache, a custom @MultiCacheable annotation is introduced to trigger AOP interceptors that query L1 → L2 → DB in order without polluting business code.

@Target(METHOD)
@Retention(RUNTIME)
public @interface MultiCacheable {
    String[] cacheNames(); // e.g. {"l1", "l2"}
    String key();
}

In conclusion, achieving a ten‑fold API speedup requires three steps: adopt the pyramid model to separate data by heat, add back‑pressure and random TTL to prevent cache avalanche, and implement warm‑up plus observability. When all three are applied, the performance gains become predictable and repeatable.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java performance optimization Cache Redis Spring Boot Caffeine

Written by

java1234

Former senior programmer at a Fortune Global 500 company, dedicated to sharing Java expertise. Visit Feng's site: Java Knowledge Sharing, www.java1234.com

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.