How a Targeted Java Refactor Delivered a 10× Performance Boost
By profiling a three‑year‑old order service and applying six data‑driven optimizations—including log reduction, object‑allocation cuts, HashMap replacement, Java 21 virtual threads, JSON caching, and ZGC tuning—the team achieved a 9.5× throughput increase and a ten‑fold drop in P99 latency.
Baseline
The order‑processing service handled thousands of requests per minute with the following symptoms:
P99 latency ≈ 800 ms
High memory usage
Frequent Minor GC pauses
Occasional thread‑pool saturation during peak load
Bottleneck analysis
Using async-profiler (Flame Graph) and Java Flight Recorder the team identified:
≈ 40 % of CPU time spent on string concatenation and logging
Object allocation rate ≈ 2 GB/min
Unnecessary synchronized blocks on several hot paths
Each request deserialized a JSON configuration object
These data‑driven findings guided the subsequent refactor.
Optimization 1 – Remove eager logging
Original code performed eager string concatenation even when DEBUG was disabled:
logger.debug("Processing order: " + order.toString());Replaced with SLF4J lazy evaluation and removal of non‑essential logs:
logger.debug("Processing order: {}", order::toString);Result: hot‑path CPU usage reduced by ~15 %.
Optimization 2 – Reduce object allocation
Original Stream pipeline created many intermediate objects:
List<Result> results = orders.stream()
.map(this::transform)
.filter(Objects::nonNull)
.collect(Collectors.toList());Rewritten as a hand‑written loop that reuses a pre‑sized buffer and introduces a lightweight object pool for high‑frequency objects:
List<Result> results = new ArrayList<>(orders.size());
for (Order order : orders) {
Result r = transform(order);
if (r != null) {
results.add(r);
}
}JFR showed allocation dropping from ~2 GB/min to ~380 MB/min, GC frequency down 70 %, and noticeable improvement in P99 latency stability.
Optimization 3 – Primitive map to avoid boxing
Original cache used Map<Integer, Long>, incurring boxing of primitive keys and values:
Map<Integer, Long> cache = new HashMap<>();Replaced with Eclipse Collections’ primitive map: MutableIntLongMap cache = new IntLongHashMap(); Result: cache‑intensive request throughput increased by ~20 %.
Optimization 4 – Java 21 virtual threads
Upgraded runtime from Java 17 to Java 21 and swapped the fixed thread pool:
ExecutorService executor = Executors.newFixedThreadPool(200);with a virtual‑thread executor:
ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor();In I/O‑bound workloads this removed the concurrency ceiling, yielding a 3× peak‑throughput increase, eliminating thread‑pool exhaustion, and removing the need for pool‑size tuning.
Optimization 5 – Cache JSON deserialization
Added a 60‑second TTL cache with double‑checked locking to avoid deserializing the configuration on every request:
private volatile CachedConfig cachedConfig;
private volatile long cacheTimestamp;
private CachedConfig getConfig() {
long now = System.currentTimeMillis();
if (now - cacheTimestamp > 60_000) {
synchronized (this) {
if (now - cacheTimestamp > 60_000) {
cachedConfig = deserialize(fetchRaw());
cacheTimestamp = now;
}
}
}
return cachedConfig;
}Result: average per‑request latency reduced by ~5 ms.
Optimization 6 – GC tuning (G1 → ZGC)
After lowering allocation pressure, the GC was switched to ZGC with a fixed 4 GB heap:
-Xms4g
-Xmx4g
-XX:+UseZGCZGC delivered sub‑millisecond pauses; the maximum GC pause fell from 340 ms to 8 ms, and latency spikes virtually disappeared.
Final metrics
P50 latency: 120 ms → 18 ms
P99 latency: 800 ms → 75 ms
Throughput: 1,200 req/s → 11,400 req/s
Allocation rate: ~2 GB/min → ~380 MB/min
Max GC pause: 340 ms → 8 ms
Key observations
Performance work driven by profiler data, not intuition.
Largest gains came from seemingly trivial places: logging, object allocation, caching, and GC strategy.
Upgrading to Java 21 and adopting virtual threads provided the single biggest benefit with minimal code changes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
LuTiao Programming
LuTiao Programming is a friendly community offering free programming lessons. We inspire learners to explore new ideas and technologies and quickly acquire job-ready skills.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
