5 Advanced Spring Boot Patterns for Millisecond API Responses

This article presents five advanced Spring Boot 3.5.0 patterns—ETag with conditional requests, read‑write separation, virtual threads, HTTP client connection reuse, and asynchronous processing—to achieve sub‑millisecond API response times, complete with code examples, configuration steps, and performance benefits.

Spring Full-Stack Practical Cases
Spring Full-Stack Practical Cases
Spring Full-Stack Practical Cases
5 Advanced Spring Boot Patterns for Millisecond API Responses

In high‑performance Spring Boot services, achieving millisecond‑level API latency requires moving beyond basic optimizations. The article introduces five patterns that address hidden inefficiencies in production environments.

1. ETag and Conditional Requests

When an API returns the same data on repeated calls, the server still performs a full query, serialization, compression, and transmission. By adding an ETag based on a version or timestamp, the client can validate freshness without downloading the body, receiving a 304 Not Modified response if unchanged.

// ❌ Every request returns the full response even if data is unchanged
@GetMapping("/products/{id}")
public ResponseEntity<Product> getProduct(@PathVariable Long id) {
    Product product = productService.findById(id);
    return ResponseEntity.ok(product);
    // Client receives 8KB JSON each time
    // Server cannot detect that data has not changed
}
// ✅ Return ETag only when data changes
@GetMapping("/{id}")
public ResponseEntity<Product> getProduct(@PathVariable Long id, WebRequest request) {
    Product product = new Product(id, "商品 - " + id, BigDecimal.valueOf(100), "1.0.0");
    String eTag = "\"" + product.version() + "\"";
    if (request.checkNotModified(eTag)) {
        return null; // Spring automatically sends 304
    }
    return ResponseEntity.ok().eTag(eTag).body(product);
}

public static record Product(Long id, String name, BigDecimal price, String version) {}

For unchanged data, serialization, compression, and transmission costs drop to zero, and the approach works seamlessly with HTTP caches, CDNs, and reverse proxies.

2. Read‑Write Separation

Most Spring Boot applications use a single datasource for both reads and writes. Under high load, read‑heavy traffic (80‑90% of queries) competes with write operations, creating a bottleneck. Using AbstractRoutingDataSource, reads can be routed to a replica while writes go to the primary.

# ❌ Single datasource – all queries hit the primary DB
spring:
  datasource:
    url: jdbc:postgresql://primary-db:5432/mydb
    username: user
    password: secret
    # Reads and writes share the same server

Define a thread‑local context to hold the target datasource:

public class DataSourceContextHolder {
    private static final ThreadLocal<String> context = new ThreadLocal<>();
    public static void setDataSourceType(String type) { context.set(type); }
    public static String getDataSourceType() { return context.get(); }
    public static void clear() { context.remove(); }
}

Configure routing datasource and beans:

@Configuration
public class DataSourceConfig {
    @Bean @ConfigurationProperties("spring.datasource.primary")
    public DataSource primaryDataSource() { return DataSourceBuilder.create().build(); }
    @Bean @ConfigurationProperties("spring.datasource.replica")
    public DataSource replicaDataSource() { return DataSourceBuilder.create().build(); }
    @Primary @Bean
    public DataSource routingDataSource() {
        Map<Object, Object> targets = new HashMap<>();
        targets.put("primary", primaryDataSource());
        targets.put("replica", replicaDataSource());
        AbstractRoutingDataSource routing = new AbstractRoutingDataSource() {
            @Override
            protected Object determineCurrentLookupKey() { return DataSourceContextHolder.getDataSourceType(); }
        };
        routing.setTargetDataSources(targets);
        routing.setDefaultTargetDataSource(primaryDataSource());
        return routing;
    }
}

An AOP aspect sets the routing key based on @Transactional(readOnly = true):

@Aspect
@Component
@Order(1)
public class DataSourceRoutingAspect {
    @Before("@annotation(transactional)")
    public void setDataSource(Transactional transactional) {
        if (transactional.readOnly()) {
            DataSourceContextHolder.setDataSourceType("replica");
        } else {
            DataSourceContextHolder.setDataSourceType("primary");
        }
    }
    @After("@annotation(org.springframework.transaction.annotation.Transactional)")
    public void clearDataSource() { DataSourceContextHolder.clear(); }
}

Service methods only need the appropriate annotation; routing happens automatically, eliminating the need to modify business logic.

3. Virtual Threads

Traditional Spring Boot uses a fixed‑size Tomcat thread pool (e.g., max: 200). Under I/O‑bound workloads, threads spend most of their time waiting, limiting concurrency. Enabling Java 21 virtual threads lets the JVM suspend threads during I/O, allowing near‑infinite concurrency with minimal memory overhead.

# ❌ Default Tomcat thread pool – hard limit on concurrency
server:
  tomcat:
    threads:
      max: 200  # 201st request must wait

Activate virtual threads with a single property:

# ✅ Enable virtual threads – virtually unlimited I/O concurrency
spring:
  threads:
    virtual:
      enabled: true

For explicit asynchronous tasks, define a virtual‑thread executor:

@Bean
public Executor taskExecutor() {
    return Executors.newVirtualThreadPerTaskExecutor();
}

The benefit is that each request consumes only ~1 KB of stack memory, compared to ~1 MB for a platform thread.

4. HTTP Client Connection Reuse

When a Spring Boot service calls downstream APIs, a new TCP/TLS connection is often created per request, incurring 100‑300 ms handshake latency. Configuring a connection‑pooled RestClient (or WebClient) reuses established sockets.

// ❌ New connection for every downstream call – handshake cost each time
@Service
public class PaymentService {
    private final RestTemplate restTemplate = new RestTemplate();
    // TLS handshake: 100‑300 ms per call
}
@Configuration
public class HttpClientConfig {
    @Bean
    public RestClient restClient() {
        HttpComponentsClientHttpRequestFactory factory = new HttpComponentsClientHttpRequestFactory(
            HttpClients.custom()
                .setConnectionManager(PoolingHttpClientConnectionManagerBuilder.create()
                    .setMaxConnTotal(100)
                    .setMaxConnPerRoute(20)
                    .build())
                .evictExpiredConnections()
                .build());
        return RestClient.builder()
            .requestFactory(factory)
            .baseUrl("https://192.168.1.23")
            .build();
    }
}

Dependency required:

<dependency>
  <groupId>org.apache.httpcomponents.client5</groupId>
  <artifactId>httpclient5</artifactId>
</dependency>

Result: TLS handshake occurs only once per connection, and setMaxConnPerRoute prevents a single slow downstream service from exhausting the pool.

5. Asynchronous Response Handling

Many APIs block the client while performing non‑essential background work such as sending emails or logging audit records. By annotating those methods with @Async, the main request thread returns immediately.

// ❌ Synchronous handling – client waits ~450 ms
@PostMapping("/orders")
public OrderResponse createOrder(@RequestBody OrderRequest request) {
    Order order = orderService.save(request); // 50 ms
    emailService.sendConfirmation(order);   // 300 ms
    auditService.log(order);               // 100 ms
    return new OrderResponse(order);
}
// ✅ Asynchronous handling – client receives response in ~50 ms
@Service
public class EmailService {
    @Async
    public CompletableFuture<Void> sendConfirmation(Order order) {
        // Email logic runs after response
        return CompletableFuture.completedFuture(null);
    }
}

@PostMapping("/orders")
public OrderResponse createOrder(@RequestBody OrderRequest request) {
    Order order = orderService.save(request); // 50 ms
    emailService.sendConfirmation(order);
    auditService.log(order);
    return new OrderResponse(order);
}

Enable async processing and configure a dedicated thread pool:

@SpringBootApplication
@EnableAsync
public class Application {
    @Bean
    public Executor asyncExecutor() {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(5);
        executor.setMaxPoolSize(20);
        executor.setQueueCapacity(100);
        executor.setThreadNamePrefix("async-");
        executor.initialize();
        return executor;
    }
}

Benefit: response time drops from ~450 ms to ~50 ms while background tasks continue independently.

Conclusion

By applying these five patterns—ETag/conditional requests, read‑write separation, virtual threads, HTTP connection pooling, and asynchronous processing—developers can eliminate hidden latency sources and consistently achieve millisecond‑level API response times in Spring Boot 3.5.0 applications.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceSpring Bootread-write-splittingvirtual-threadsasyncetaghttp-connection-poolingspring-boot-3.5
Spring Full-Stack Practical Cases
Written by

Spring Full-Stack Practical Cases

Full-stack Java development with Vue 2/3 front-end suite; hands-on examples and source code analysis for Spring, Spring Boot 2/3, and Spring Cloud.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.