How Discard Policy and Error Threshold Rescue Java Services During Log Overload
This article analyzes a severe service‑availability drop caused by Log4j2 asynchronous logging bottlenecks, explains how configuring log4j2.asyncQueueFullPolicy=Discard and log4j2.discardThreshold=ERROR mitigates the issue, details the investigation steps, performance tests, and provides practical recommendations for robust backend logging.
Overview
Log4j2 asynchronous logging can become a bottleneck when a dependent RPC service fails under heavy traffic. Setting log4j2.asyncQueueFullPolicy=Discard and log4j2.discardThreshold=ERROR provides rapid containment and limits impact.
Incident Recap
On 2023‑12‑15 at 14:25 the interface service availability dropped from 100 % to 0.72 % after a dependent system failure. After a brief recovery following a machine scale‑out, the metric fell again and the service required a client‑side restart to recover.
Investigation Steps
First Check – GC
Young GC count increased, Full GC was not triggered, heap and non‑heap usage were normal, CPU usage rose slightly but stayed within acceptable limits.
Second Check – Disk I/O
Disk usage, busy time and write speed appeared normal, although a short spike in usage was observed.
Third Check – Thread Dump
JSF client threads were blocked in the enqueue method while processing WARN/ERROR logs.
// enqueue source analysis
// location: org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor#enqueue 363
private void enqueue(final LogEvent logEvent, final AsyncLoggerConfig asyncLoggerConfig) {
if (synchronizeEnqueueWhenQueueFull()) {
synchronized (queueFullEnqueueLock) {
disruptor.getRingBuffer().publishEvent(translator, logEvent, asyncLoggerConfig);
}
} else {
disruptor.getRingBuffer().publishEvent(translator, logEvent, asyncLoggerConfig);
}
}The RingBuffer grew to 1.61 GB and was filled mainly with WARN and ERROR events containing large stack traces.
Configuration Insights
log4j2.asyncQueueFullPolicydetermines how to handle a full queue; Discard drops events. log4j2.discardThreshold sets the highest level to discard (default INFO, recommended ERROR).
Performance Tests
Four load‑tests were run while varying log4j2.discardThreshold:
INFO – service availability recovered only after client restart.
WARN – availability returned to ~95 % in 3 min, 100 % in 8 min.
ERROR – full recovery within ~2 min without restart.
FATAL – similar to ERROR.
Asynchronous Logging Mechanism
A simplified diagram shows Log4j2 using the Disruptor RingBuffer to decouple producers and consumers.
Recommendations
Use log4j2.asyncQueueFullPolicy=Discard together with log4j2.discardThreshold=ERROR in production.
Avoid Appenders that interact with external middleware (e.g., KafkaAppender) unless necessary; if used, set syncSend=false for Log4j2 2.8+.
Set immediateFlush=false to batch writes and reduce disk I/O.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
