Why Is My Java Service Stalling? Uncovering GC, Safepoint, and Log4j2 Bottlenecks

In a high‑concurrency Java service, occasional request timeouts were traced to long pauses between log entries, leading to an investigation that revealed frequent JVM stop‑the‑world events caused by GC, safepoint‑related biased‑lock revocations, and Log4j2 locking issues.

Java Interview Crash Guide
Java Interview Crash Guide
Java Interview Crash Guide
Why Is My Java Service Stalling? Uncovering GC, Safepoint, and Log4j2 Bottlenecks

A high‑concurrency Java service occasionally timed out because the gap between the log after an HTTP client call (A) and the log after JSON parsing (B) was unusually large, ranging from 100 ms to 700 ms.

GC

Possible causes considered were application locks, JVM GC causing stop‑the‑world (STW), and system load; the latter was ruled out.

Application lock – excluded because JSON parsing itself is lock‑free.

JVM GC – could trigger STW.

System overload – monitoring showed low load.

Using jstat showed infrequent full GC and normal minor GC intervals, with -XX:+PrintGCApplicationStoppedTime enabled to record all STW events in the GC log.

The GC log revealed frequent, long STW pauses, sometimes less than 1 ms apart, causing cumulative hangs of over 120 ms.

Safepoint and Biased Lock

Safepoint Logs

STW occurs when all threads reach a safepoint; the safepoint log records entry and exit times, helping identify the cause.

Enabling safepoint logging with

-XX:+UnlockDiagnosticVMOptions -XX:+PrintSafepointStatistics -XX:+LogVMOutput -XX:LogFile=./safepoint.log

produced logs like the following:

The "vmopt" column showed the reason RevokeBias, indicating a biased‑lock revocation.

Biased Lock

Biased locking optimizes uncontended locks by biasing them toward the first acquiring thread; the lock is only revoked when contention occurs, which requires a safepoint and can be costly under high concurrency.

Disabling biased locking with -XX:-UseBiasedLocking reduced pause frequency by half, but some pauses remained.

Log4j2

Investigation

Potential culprits (HttpClient, Hystrix, Log4j2) were isolated; replacing third‑party responses and removing Hystrix still reproduced the issue, pinpointing Log4j2.

Using btrace to probe Log4j2 locks

Three locking points in Log4j2 were identified: rollover() – locks during log file rotation. encodeText() – synchronizes character‑set conversion for large logs. flush() – synchronizes to preserve log order.

Instrumenting these methods with btrace showed that encodeText() incurred the longest execution time during load tests.

Using JMC analysis

environment:
   - JFR=true
   - JMX_PORT=port
   - JMX_HOST=ip
   - JMX_LOGIN=user:pwd

JMC captured a 1063 ms pause in RandomAccessFile.write(), matching the thread ID observed in the STW logs, suggesting a native I/O bottleneck, possibly Docker‑related.

Solution

Reduce log volume; excessive logging amplifies pauses.

Switch to Log4j2 asynchronous logging (accepting possible loss on buffer overflow or restart).

Checklist Summary

Collect multiple failure cases to identify common patterns and avoid false leads.

Reproduce the issue in a controlled environment that mirrors production.

Compare recent changes and hypothesize causes.

Use elimination: vary one variable at a time to see its impact.

Apply the fix—often a single configuration or code change.

Support findings with quantitative data to convince stakeholders.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

troubleshootinglog4j2
Java Interview Crash Guide
Written by

Java Interview Crash Guide

Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.