Can Spring Boot Handle 500k QPS? A Proven Architecture Demonstrates the Answer

The article debunks the myth that Spring Boot cannot scale, showing that with WebFlux, Netty, Reactive Redis, GraalVM Native Image and horizontal scaling a Spring Boot service can reliably achieve 500,000 queries per second, backed by concrete benchmark data and a full demo implementation.

LuTiao Programming
LuTiao Programming
LuTiao Programming
Can Spring Boot Handle 500k QPS? A Proven Architecture Demonstrates the Answer

Two long‑standing extreme views about Spring Boot exist: it is only for CRUD and cannot handle high concurrency, or it can magically reach millions of QPS with a few tweaks. Both are false. The default Spring Boot configuration cannot sustain 500 k QPS, but a properly designed architecture can.

Many modern systems—AI inference APIs, real‑time risk control, IoT event reporting, edge data aggregation, global SaaS API gateways—share a common demand: massive request volume, lightweight logic, and latency sensitivity. This workload matches the strengths of WebFlux combined with Redis.

Direct answer to “Can Spring Boot reach 500 k QPS?”: Tomcat + MVC is impossible; blocking I/O + JDBC has a low ceiling; the default thread model leads to GC failures. By switching to Spring WebFlux, Netty (non‑blocking I/O), Project Reactor, Redis/Kafka, GraalVM Native Image and horizontal scaling, Spring Boot can become the core of a high‑concurrency system.

Step 1: Replace Tomcat with WebFlux + Netty

Tomcat allocates one thread per request, causing massive context‑switch overhead under high load. Netty’s event‑driven, non‑blocking architecture uses far fewer threads, achieves higher CPU utilization, and supports many concurrent connections.

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-webflux</artifactId>
</dependency>

Step 2: Eliminate All Blocking Points

In a Netty‑based system, any thread blockage halves the overall QPS. The article lists typical blockers and their reactive replacements:

JDBC / JPA → R2DBC

File I/O → Reactive file APIs

RestTemplate → WebClient

Thread.sleep, synchronized → non‑blocking reactive patterns

Step 3: Let the Database Handle Only Asynchronous Writes

At 500 k QPS the database becomes the bottleneck; Redis should serve as the first entry point. The database is used for async batch writes with eventual consistency.

Demo Project Structure

/usr/local/app/high-qps-demo
├── src/main/java
│   └── com/icoderoad
│       ├── Application.java
│       ├── config/RedisConfig.java
│       ├── api/CacheController.java
│       └── service/CacheService.java
└── src/main/resources/application.yml

Key Code Snippets

Redis Reactive Dependency

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-data-redis-reactive</artifactId>
</dependency>

Reactive Redis Configuration

package com.icoderoad.config;

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.redis.connection.ReactiveRedisConnectionFactory;
import org.springframework.data.redis.core.ReactiveStringRedisTemplate;

@Configuration
public class RedisConfig {
    @Bean
    public ReactiveStringRedisTemplate reactiveRedisTemplate(ReactiveRedisConnectionFactory factory) {
        return new ReactiveStringRedisTemplate(factory);
    }
}

Cache Service (non‑blocking)

package com.icoderoad.service;

import org.springframework.data.redis.core.ReactiveStringRedisTemplate;
import org.springframework.stereotype.Service;
import reactor.core.publisher.Mono;

@Service
public class CacheService {
    private final ReactiveStringRedisTemplate redisTemplate;
    public CacheService(ReactiveStringRedisTemplate redisTemplate) {
        this.redisTemplate = redisTemplate;
    }
    public Mono<String> getValue(String key) {
        return redisTemplate.opsForValue().get(key).defaultIfEmpty("EMPTY");
    }
    public Mono<Boolean> setValue(String key, String value) {
        return redisTemplate.opsForValue().set(key, value);
    }
}

WebFlux Controller (stateless)

package com.icoderoad.api;

import com.icoderoad.service.CacheService;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RestController;
import reactor.core.publisher.Mono;

@RestController
public class CacheController {
    private final CacheService cacheService;
    public CacheController(CacheService cacheService) {
        this.cacheService = cacheService;
    }
    @GetMapping("/cache/{key}")
    public Mono<String> get(@PathVariable String key) {
        return cacheService.getValue(key);
    }
}

application.yml (essential parameters)

server:
  port: 8080
spring:
  redis:
    host: 127.0.0.1
    port: 6379
    timeout: 2s
logging:
  level:
    root: ERROR

Performance Results

Single‑instance (non‑Native) achieves 80–120 k QPS with P99 latency under 3 ms. Using GraalVM Native Image and scaling to 10 instances at 50 k QPS each yields roughly 500 k QPS.

👉 10 instances × 50 k QPS ≈ 500 k QPS

Internal Communication

For inter‑service calls, the article recommends gRPC + HTTP/2 over REST, and HTTP/3/QUIC at the edge, delivering lower latency, higher throughput and reduced CPU consumption.

Horizontal Scaling

No single Java service can sustain 500 k QPS alone. Scaling patterns such as 5 × 100 k or 10 × 50 k instances are straightforward in a cloud‑native environment.

Native Image as a QPS Amplifier

Spring Boot 3 + GraalVM provides millisecond‑level startup, >60 % memory reduction, and a 5–10× QPS boost. System‑level tuning (e.g., net.core.somaxconn=65535, net.ipv4.ip_local_port_range="10000 65535", net.core.netdev_max_backlog=4096) is essential; otherwise the OS rejects connections before the service reaches its limit.

Real‑World Deployable Architecture

Edge
 └─ Cloudflare / Fastly
LB
 └─ NGINX / Envoy
App
 └─ 10 × Spring Boot WebFlux (Native)
Data
 ├─ Redis Cluster
 ├─ Kafka
 └─ Async DB

What Kills Performance

Tomcat

JDBC

RestTemplate

Heavy business logic

Global locks

Massive object creation

Conclusion

Spring Boot can power high‑concurrency systems if you adopt WebFlux + Netty, place Redis at the front, keep the entire call chain non‑blocking, leverage GraalVM Native Image, and scale horizontally. The default Tomcat‑based stack, blocking I/O, and monolithic design remain the real bottlenecks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

NettyWebFluxNative Imagehorizontal scalingQPSspring-boothigh-concurrencyreactive-redis
LuTiao Programming
Written by

LuTiao Programming

LuTiao Programming is a friendly community offering free programming lessons. We inspire learners to explore new ideas and technologies and quickly acquire job-ready skills.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.