15 SpringBoot Performance Tweaks to Handle Million-Scale Concurrency
This guide walks through exposing metrics, integrating Prometheus and Grafana, using async‑profiler flame graphs, tuning Tomcat/Undertow, optimizing JVM flags, applying SkyWalking tracing, and applying layer‑wise code, cache, and thread‑pool improvements so a SpringBoot service can reliably serve millions of concurrent requests.
Metric Exposure and Monitoring
Expose runtime state (cache hit rate, DB pool usage, latency distribution, CPU/memory, GC pauses) before tuning.
Prometheus Integration
1. Maven dependencies
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-core</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>2. application.properties / application.yml
management.endpoint.metrics.enabled=true
management.endpoint.prometheus.enabled=true
management.endpoints.web.exposure.include=health,info,prometheus,metrics
management.metrics.export.prometheus.enabled=true
management.endpoint.health.show-details=always
# optional: expose httptrace, threaddump, etc.3. Start and access
After starting the app, visit http://<host>:<port>/actuator/prometheus. Configure Prometheus to scrape this endpoint, e.g. in prometheus.yml:
scrape_configs:
- job_name: 'springboot-app'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['app-host:port']4. Custom business metric example
@RestController
public class TestController {
private final MeterRegistry registry;
public TestController(MeterRegistry registry) { this.registry = registry; }
@GetMapping("/test")
public String test() {
registry.counter("app_test_invocations", "from", "127.0.0.1", "method", "test").increment();
return "ok";
}
}Prometheus will expose
app_test_invocations_total{from="127.0.0.1",method="test"} 5.0after five calls.
Cache hit‑rate can be recorded by attaching listeners to CacheManager and incrementing cache_hit or cache_miss counters.
Grafana Visualization and AlertManager
Configure Prometheus as a data source in Grafana and build dashboards showing JVM memory, GC pauses, thread count, HTTP request rate & latency (using Histogram / Summary), cache hit‑rate ( hit/(hit+miss)), and DB connection pool usage (active vs max).
Typical AlertManager rules:
95% latency exceeds a threshold
GC pause duration too long
Connection‑pool exhaustion
Performance Profiling with Flame Graphs
async‑profiler download and usage
1. Download the async‑profiler release from GitHub and unpack, e.g. /opt/async-profiler.
2. Start the SpringBoot JVM with the agent:
java -agentpath:/opt/async-profiler/build/libasyncProfiler.so=start,svg,file=profile.svg -jar your-app.jar3. Run a realistic workload (load test or real traffic), then stop sampling:
# attach mode without restart
./profiler.sh -d 30 -f profile.svg <pid>4. Open the generated profile.svg in a browser. The horizontal axis shows sampled time proportion (wider = hotter), the vertical axis shows call‑stack depth. Drill down from the widest segment to locate hot methods and optimise them.
Flame‑graph optimisation examples
If a hot spot is a slow method, inspect its logic for unnecessary loops, I/O blocking, or opportunities for parallelism.
If serialization dominates, switch to a more efficient library or shrink the response object.
If GC consumes most time, analyse GC logs and tune heap configuration.
HTTP and Web‑Layer Optimisation
CDN and static‑resource acceleration
Host common static assets (JS, CSS, images) on a CDN to offload the backend and serve from geographically close nodes.
Use public CDNs for third‑party libraries; upload your own assets to a CDN or static file server.
Cache‑Control / Expires in Nginx
location ~* \.(ico|gif|jpg|jpeg|png|css|js)$ {
add_header Cache-Control "max-age=31536000, immutable";
}Version the resource URL or use file fingerprints to keep cache hits while allowing updates.
Reduce domain count
Combine static resources and API under a single domain to avoid extra DNS lookups.
Enable HTTP/2 multiplexing where both client and server support it.
Gzip and resource compression
gzip on;
gzip_min_length 1k;
gzip_buffers 4 16k;
gzip_comp_level 6;
gzip_http_version 1.1;
gzip_types text/plain application/javascript text/css application/json;Enable gzip for dynamic JSON responses and pre‑compress large static files (e.g., Brotli) during build.
Keep‑Alive configuration
http {
keepalive_timeout 60s 60s;
keepalive_requests 10000;
}
location / {
proxy_pass http://backend;
proxy_http_version 1.1;
proxy_set_header Connection "";
}SpringBoot/Tomcat enables Keep‑Alive by default; customise the Connector to adjust timeouts.
SpringBoot Container Tuning
Custom embedded Tomcat
For high concurrency, adjust max threads, max connections, and timeout via a WebServerFactoryCustomizer:
@SpringBootApplication(proxyBeanMethods = false)
public class App implements WebServerFactoryCustomizer<ConfigurableServletWebServerFactory> {
public static void main(String[] args) { SpringApplication.run(App.class, args); }
@Override
public void customize(ConfigurableServletWebServerFactory factory) {
if (factory instanceof TomcatServletWebServerFactory) {
TomcatServletWebServerFactory f = (TomcatServletWebServerFactory) factory;
f.setProtocol("org.apache.coyote.http11.Http11Nio2Protocol");
f.addConnectorCustomizers(c -> {
Http11NioProtocol p = (Http11NioProtocol) c.getProtocolHandler();
p.setMaxConnections(200);
p.setMaxThreads(200);
p.setConnectionTimeout(30000);
});
}
}
}Using the NIO2 protocol can improve I/O performance under heavy load; verify with benchmarks. Adjust setMaxThreads and setMaxConnections according to hardware and business concurrency.
Replace Tomcat with Undertow
Exclude Tomcat and add Undertow in pom.xml:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<exclusions>
<exclusion>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-tomcat</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-undertow</artifactId>
</dependency>Undertow’s thread model is generally lighter; benchmark against Tomcat in your workload.
Tomcat‑specific settings do not apply and must be adjusted.
JVM parameter recap
-XX:+UseG1GC -Xms2048m -Xmx2048m -XX:+AlwaysPreTouch -XX:MaxMetaspaceSize=256m \
-XX:ReservedCodeCacheSize=240m -XX:MaxDirectMemorySize=512m \
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/dumps \
-XX:ErrorFile=/path/to/hs_err_pid%p.log \
-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=5,filesize=100mTune heap size to hardware; if GC pauses are problematic, combine async‑profiler and GC logs for deeper analysis.
Application Performance Monitoring and Distributed Tracing
SkyWalking integration
1. Download and deploy the SkyWalking agent and backend (e.g., Elasticsearch storage).
2. Start the SpringBoot app with the agent:
java -javaagent:/opt/skywalking-agent/skywalking-agent.jar \
-Dskywalking.agent.service_name=your-service-name \
-Dskywalking.collector.backend_service=collector-host:11800 \
-jar your-app.jar3. In the SkyWalking UI view per‑request call‑chain graphs, segment timings, DB/HTTP call details, JVM and GC metrics.
4. Combine SkyWalking alerts with Prometheus AlertManager to set thresholds.
Layer‑wise Optimisation Strategies
Controller layer
Trim DTOs and use pagination or streaming ( Spring MVC StreamingResponseBody) to avoid huge result sets.
Choose a high‑performance JSON serializer (Jackson with Afterburner) and avoid serialising unnecessary fields.
Validate input and apply rate‑limiting early to reject malicious requests.
Make cache‑able GET endpoints idempotent and leverage HTTP cache headers.
Service layer
Keep beans stateless (default singleton) to avoid request‑level state.
Split complex logic into small modules for easier monitoring.
Use CompletableFuture or async message queues for parallelisable sub‑tasks, while configuring thread pools and context propagation.
Local cache (Caffeine) for low‑latency reads; distributed cache (Redis) for shared data, with TTL, bloom‑filter pre‑heat, random expiry, and mutex lock to prevent cache stampede.
DAO layer
Prefer lazy loading ( FetchType.LAZY) and explicit queries or DTO projections to avoid N+1 queries.
Batch insert/update for bulk writes.
Analyse slow‑query logs with EXPLAIN, ensure indexes are used, and avoid full table scans.
When sharding, understand middleware routing overhead and avoid naïve SQL tricks.
HikariCP is efficient but still monitor active connections and wait times; adjust max pool size based on concurrency.
Cache optimisation
Caffeine for local cache—monitor size, hit‑rate, load latency; avoid oversized caches that consume memory.
Redis—monitor Lettuce/Redisson connection pool, choose efficient serialization (JSON, Kryo, FST), and design TTL per business scenario.
Consider a two‑level cache (local + distributed) for read‑heavy workloads, handling consistency carefully.
Resource and Thread Management
Configure custom Executor pools with appropriate core/max sizes and queue lengths; monitor thread activity to prevent backlog.
Avoid excessive scheduled‑task threads competing for resources.
When suitable, adopt WebFlux/Reactor for non‑blocking processing, but weigh team expertise and actual use‑case.
Use async HTTP clients (e.g., WebClient) for outbound calls to avoid blocking threads.
End‑to‑End Testing and Load‑Testing
Tools: wrk, JMeter, Locust.
Prepare monitoring, logging, and profiling before the test; deploy an environment mirroring production.
Design scripts that simulate realistic flows (login, queries, writes).
Analyse results with Prometheus/Grafana metrics and flame graphs; iterate, re‑test, and measure impact.
Run gradual ramp‑up, steady‑state, and prolonged stress tests to observe resource consumption and latency curves.
Conclusion
Metric collection → profiling → optimisation → verification forms a closed loop; keep the system observable.
Make incremental changes in non‑production environments first, then roll out small steps with monitoring and rollback plans.
Integrate simple health checks and performance baselines into CI/CD pipelines.
Periodically review key endpoint metrics to catch regressions after code or dependency upgrades.
Document optimisation experience and share with the team to build consensus and standards.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Architect Essentials
Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
