Backend Development 20 min read

15 SpringBoot Performance Tweaks to Handle Million-Scale Concurrency

This guide walks through exposing metrics, integrating Prometheus and Grafana, using async‑profiler flame graphs, tuning Tomcat/Undertow, optimizing JVM flags, applying SkyWalking tracing, and applying layer‑wise code, cache, and thread‑pool improvements so a SpringBoot service can reliably serve millions of concurrent requests.

Java Architect Essentials

Apr 26, 2026

15 SpringBoot Performance Tweaks to Handle Million-Scale Concurrency

Metric Exposure and Monitoring

Expose runtime state (cache hit rate, DB pool usage, latency distribution, CPU/memory, GC pauses) before tuning.

Prometheus Integration

1. Maven dependencies

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
  <groupId>io.micrometer</groupId>
  <artifactId>micrometer-core</artifactId>
</dependency>
<dependency>
  <groupId>io.micrometer</groupId>
  <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

2. application.properties / application.yml

management.endpoint.metrics.enabled=true
management.endpoint.prometheus.enabled=true
management.endpoints.web.exposure.include=health,info,prometheus,metrics
management.metrics.export.prometheus.enabled=true
management.endpoint.health.show-details=always
# optional: expose httptrace, threaddump, etc.

3. Start and access

After starting the app, visit http://<host>:<port>/actuator/prometheus. Configure Prometheus to scrape this endpoint, e.g. in prometheus.yml:

scrape_configs:
  - job_name: 'springboot-app'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['app-host:port']

4. Custom business metric example

@RestController
public class TestController {
    private final MeterRegistry registry;
    public TestController(MeterRegistry registry) { this.registry = registry; }
    @GetMapping("/test")
    public String test() {
        registry.counter("app_test_invocations", "from", "127.0.0.1", "method", "test").increment();
        return "ok";
    }
}

Prometheus will expose

app_test_invocations_total{from="127.0.0.1",method="test"} 5.0

after five calls.

Cache hit‑rate can be recorded by attaching listeners to CacheManager and incrementing cache_hit or cache_miss counters.

Grafana Visualization and AlertManager

Configure Prometheus as a data source in Grafana and build dashboards showing JVM memory, GC pauses, thread count, HTTP request rate & latency (using Histogram / Summary), cache hit‑rate ( hit/(hit+miss)), and DB connection pool usage (active vs max).

Typical AlertManager rules:

95% latency exceeds a threshold

GC pause duration too long

Connection‑pool exhaustion

Performance Profiling with Flame Graphs

async‑profiler download and usage

1. Download the async‑profiler release from GitHub and unpack, e.g. /opt/async-profiler.

2. Start the SpringBoot JVM with the agent:

java -agentpath:/opt/async-profiler/build/libasyncProfiler.so=start,svg,file=profile.svg -jar your-app.jar

3. Run a realistic workload (load test or real traffic), then stop sampling:

# attach mode without restart
./profiler.sh -d 30 -f profile.svg <pid>

4. Open the generated profile.svg in a browser. The horizontal axis shows sampled time proportion (wider = hotter), the vertical axis shows call‑stack depth. Drill down from the widest segment to locate hot methods and optimise them.

Flame‑graph optimisation examples

If a hot spot is a slow method, inspect its logic for unnecessary loops, I/O blocking, or opportunities for parallelism.

If serialization dominates, switch to a more efficient library or shrink the response object.

If GC consumes most time, analyse GC logs and tune heap configuration.

HTTP and Web‑Layer Optimisation

CDN and static‑resource acceleration

Host common static assets (JS, CSS, images) on a CDN to offload the backend and serve from geographically close nodes.

Use public CDNs for third‑party libraries; upload your own assets to a CDN or static file server.

Cache‑Control / Expires in Nginx

location ~* \.(ico|gif|jpg|jpeg|png|css|js)$ {
    add_header Cache-Control "max-age=31536000, immutable";
}

Version the resource URL or use file fingerprints to keep cache hits while allowing updates.

Reduce domain count

Combine static resources and API under a single domain to avoid extra DNS lookups.

Enable HTTP/2 multiplexing where both client and server support it.

Gzip and resource compression

gzip on;
 gzip_min_length 1k;
 gzip_buffers 4 16k;
 gzip_comp_level 6;
 gzip_http_version 1.1;
 gzip_types text/plain application/javascript text/css application/json;

Enable gzip for dynamic JSON responses and pre‑compress large static files (e.g., Brotli) during build.

Keep‑Alive configuration

http {
    keepalive_timeout 60s 60s;
    keepalive_requests 10000;
}
location / {
    proxy_pass http://backend;
    proxy_http_version 1.1;
    proxy_set_header Connection "";
}

SpringBoot/Tomcat enables Keep‑Alive by default; customise the Connector to adjust timeouts.

SpringBoot Container Tuning

Custom embedded Tomcat

For high concurrency, adjust max threads, max connections, and timeout via a WebServerFactoryCustomizer:

@SpringBootApplication(proxyBeanMethods = false)
public class App implements WebServerFactoryCustomizer<ConfigurableServletWebServerFactory> {
    public static void main(String[] args) { SpringApplication.run(App.class, args); }
    @Override
    public void customize(ConfigurableServletWebServerFactory factory) {
        if (factory instanceof TomcatServletWebServerFactory) {
            TomcatServletWebServerFactory f = (TomcatServletWebServerFactory) factory;
            f.setProtocol("org.apache.coyote.http11.Http11Nio2Protocol");
            f.addConnectorCustomizers(c -> {
                Http11NioProtocol p = (Http11NioProtocol) c.getProtocolHandler();
                p.setMaxConnections(200);
                p.setMaxThreads(200);
                p.setConnectionTimeout(30000);
            });
        }
    }
}

Using the NIO2 protocol can improve I/O performance under heavy load; verify with benchmarks. Adjust setMaxThreads and setMaxConnections according to hardware and business concurrency.

Replace Tomcat with Undertow

Exclude Tomcat and add Undertow in pom.xml:

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-web</artifactId>
  <exclusions>
    <exclusion>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-tomcat</artifactId>
    </exclusion>
  </exclusions>
</dependency>
<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-undertow</artifactId>
</dependency>

Undertow’s thread model is generally lighter; benchmark against Tomcat in your workload.

Tomcat‑specific settings do not apply and must be adjusted.

JVM parameter recap

-XX:+UseG1GC -Xms2048m -Xmx2048m -XX:+AlwaysPreTouch -XX:MaxMetaspaceSize=256m \
-XX:ReservedCodeCacheSize=240m -XX:MaxDirectMemorySize=512m \
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/dumps \
-XX:ErrorFile=/path/to/hs_err_pid%p.log \
-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=5,filesize=100m

Tune heap size to hardware; if GC pauses are problematic, combine async‑profiler and GC logs for deeper analysis.

Application Performance Monitoring and Distributed Tracing

SkyWalking integration

1. Download and deploy the SkyWalking agent and backend (e.g., Elasticsearch storage).

2. Start the SpringBoot app with the agent:

java -javaagent:/opt/skywalking-agent/skywalking-agent.jar \
    -Dskywalking.agent.service_name=your-service-name \
    -Dskywalking.collector.backend_service=collector-host:11800 \
    -jar your-app.jar

3. In the SkyWalking UI view per‑request call‑chain graphs, segment timings, DB/HTTP call details, JVM and GC metrics.

4. Combine SkyWalking alerts with Prometheus AlertManager to set thresholds.

Layer‑wise Optimisation Strategies

Controller layer

Trim DTOs and use pagination or streaming ( Spring MVC StreamingResponseBody) to avoid huge result sets.

Choose a high‑performance JSON serializer (Jackson with Afterburner) and avoid serialising unnecessary fields.

Validate input and apply rate‑limiting early to reject malicious requests.

Make cache‑able GET endpoints idempotent and leverage HTTP cache headers.

Service layer

Keep beans stateless (default singleton) to avoid request‑level state.

Split complex logic into small modules for easier monitoring.

Use CompletableFuture or async message queues for parallelisable sub‑tasks, while configuring thread pools and context propagation.

Local cache (Caffeine) for low‑latency reads; distributed cache (Redis) for shared data, with TTL, bloom‑filter pre‑heat, random expiry, and mutex lock to prevent cache stampede.

DAO layer

Prefer lazy loading ( FetchType.LAZY) and explicit queries or DTO projections to avoid N+1 queries.

Batch insert/update for bulk writes.

Analyse slow‑query logs with EXPLAIN, ensure indexes are used, and avoid full table scans.

When sharding, understand middleware routing overhead and avoid naïve SQL tricks.

HikariCP is efficient but still monitor active connections and wait times; adjust max pool size based on concurrency.

Cache optimisation

Caffeine for local cache—monitor size, hit‑rate, load latency; avoid oversized caches that consume memory.

Redis—monitor Lettuce/Redisson connection pool, choose efficient serialization (JSON, Kryo, FST), and design TTL per business scenario.

Consider a two‑level cache (local + distributed) for read‑heavy workloads, handling consistency carefully.

Resource and Thread Management

Configure custom Executor pools with appropriate core/max sizes and queue lengths; monitor thread activity to prevent backlog.

Avoid excessive scheduled‑task threads competing for resources.

When suitable, adopt WebFlux/Reactor for non‑blocking processing, but weigh team expertise and actual use‑case.

Use async HTTP clients (e.g., WebClient) for outbound calls to avoid blocking threads.

End‑to‑End Testing and Load‑Testing

Tools: wrk, JMeter, Locust.

Prepare monitoring, logging, and profiling before the test; deploy an environment mirroring production.

Design scripts that simulate realistic flows (login, queries, writes).

Analyse results with Prometheus/Grafana metrics and flame graphs; iterate, re‑test, and measure impact.

Run gradual ramp‑up, steady‑state, and prolonged stress tests to observe resource consumption and latency curves.

Conclusion

Metric collection → profiling → optimisation → verification forms a closed loop; keep the system observable.

Make incremental changes in non‑production environments first, then roll out small steps with monitoring and rollback plans.

Integrate simple health checks and performance baselines into CI/CD pipelines.

Periodically review key endpoint metrics to catch regressions after code or dependency upgrades.

Document optimisation experience and share with the team to build consensus and standards.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance Prometheus Spring Boot Nginx tomcat Grafana skywalking async-profiler

Written by

Java Architect Essentials

Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.