Cutting a 5‑Second Java Service to <1s with Compression, Parallelism & Caching

This article details how a Java Spring Boot microservice's response time was reduced from 5‑6 seconds to under one second by applying gzip compression, parallel data fetching, short‑lived caching, MySQL index tuning, and JVM G1 garbage‑collector adjustments.

Java Backend Technology
Java Backend Technology
Java Backend Technology
Cutting a 5‑Second Java Service to <1s with Compression, Parallelism & Caching

Performance optimization often seems abstract until response times become intolerable; this article describes a real‑world case where a Java/Spring Boot microservice’s latency dropped from 5‑6 seconds to under 1 second.

Optimization background and goals

The target service sits upstream in a micro‑service topology, invoked via Feign, aggregated, and exposed through Zuul and Nginx. Monitoring with SkyWalking and Prometheus revealed two key metrics: throughput (QPS/TPS) and average response time. The goal was to reduce average latency below 1 s and increase QPS.

Compression dramatically reduces transfer time

Enabling gzip compression at the Nginx layer shrank a 10 MB payload to 368 KB, cutting download time. The article shows the Nginx gzip configuration and the before‑after size comparison.

gzip on;
gzip_vary on;
gzip_min_length 10240;
gzip_proxied expired no-cache no-store private auth;
gzip_types text/plain text/css text/xml text/javascript application/x-javascript application/xml;
gzip_disable "MSIE [1-6]\\.";

To propagate compression through Feign calls, the project added the feign-okhttp dependency and enabled Spring Boot’s server‑side and client‑side compression settings.

<dependency>
  <groupId>io.github.openfeign</groupId>
  <artifactId>feign-okhttp</artifactId>
</dependency>
server:
  port: 8888
  compression:
    enabled: true
    min-response-size: 1024
    mime-types: ["text/html","text/xml","application/xml","application/json","application/octet-stream"]
feign:
  httpclient:
    enabled: false
  okhttp:
    enabled: true

Parallel data fetching speeds up aggregation

The aggregation endpoint called dozens of downstream services sequentially. By analyzing dependencies, the calls were split into two independent groups (A and B) and executed concurrently using CountDownLatch and a custom ThreadPoolExecutor.

CountDownLatch latch = new CountDownLatch(jobSize);
// submit job
executor.execute(() -> {
    // job code
    latch.countDown();
});
...
latch.await(timeout, TimeUnit.MILLISECONDS);
final ThreadPoolExecutor executor = new ThreadPoolExecutor(100, 200, 1,
    TimeUnit.HOURS, new ArrayBlockingQueue<>(100));

Cache categorization for further gains

Repeated requests inside loops were cached using Redis (Cache‑Aside pattern) and Guava’s LoadingCache with a 1‑second expiry, eliminating many unnecessary remote calls.

LoadingCache<String, String> lc = CacheBuilder.newBuilder()
    .expireAfterWrite(1, TimeUnit.SECONDS)
    .build(new CacheLoader<String, String>() {
        @Override
        public String load(String key) throws Exception {
            return slowMethod(key);
        }
    });

MySQL index optimization

Common index pitfalls were addressed: avoiding functions on indexed columns, matching data types, respecting character sets, applying the left‑most prefix rule, using covering indexes, and forcing the optimizer when necessary.

JVM tuning

GC logs were enabled, and the garbage collector was switched to G1 with parameters such as -XX:MaxGCPauseMillis and -XX:G1HeapRegionSize, resulting in smoother pause times.

-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/xxx.hprof -DlogPath=/opt/logs/ -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintTenuringDistribution -Xloggc:/opt/logs/gc_%p.log -XX:ErrorFile=/opt/logs/hs_error_pid%p.log

Other optimizations

Code‑level clean‑ups included avoiding costly Map.clear() calls, replacing expensive ConcurrentLinkedQueue.size() usage, and delegating front‑end performance bottlenecks to UI developers.

Conclusion

By combining compression, parallelism, short‑lived caching, index tuning, and JVM G1 adjustments, the service’s response time fell from 5‑6 seconds to under 1 second, achieving a substantial performance improvement.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaJVMPerformance OptimizationMicroservicesmysqlcompressionParallelism
Java Backend Technology
Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.