Backend Development 15 min read

How Discard Policy and Error Threshold Rescue Java Services During Log Overload

This article analyzes a severe service‑availability drop caused by Log4j2 asynchronous logging bottlenecks, explains how configuring log4j2.asyncQueueFullPolicy=Discard and log4j2.discardThreshold=ERROR mitigates the issue, details the investigation steps, performance tests, and provides practical recommendations for robust backend logging.

JD Cloud Developers

Dec 19, 2024

How Discard Policy and Error Threshold Rescue Java Services During Log Overload

Overview

Log4j2 asynchronous logging can become a bottleneck when a dependent RPC service fails under heavy traffic. Setting log4j2.asyncQueueFullPolicy=Discard and log4j2.discardThreshold=ERROR provides rapid containment and limits impact.

Incident Recap

On 2023‑12‑15 at 14:25 the interface service availability dropped from 100 % to 0.72 % after a dependent system failure. After a brief recovery following a machine scale‑out, the metric fell again and the service required a client‑side restart to recover.

Investigation Steps

First Check – GC

Young GC count increased, Full GC was not triggered, heap and non‑heap usage were normal, CPU usage rose slightly but stayed within acceptable limits.

Second Check – Disk I/O

Disk usage, busy time and write speed appeared normal, although a short spike in usage was observed.

Third Check – Thread Dump

JSF client threads were blocked in the enqueue method while processing WARN/ERROR logs.

// enqueue source analysis
// location: org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor#enqueue 363
private void enqueue(final LogEvent logEvent, final AsyncLoggerConfig asyncLoggerConfig) {
    if (synchronizeEnqueueWhenQueueFull()) {
        synchronized (queueFullEnqueueLock) {
            disruptor.getRingBuffer().publishEvent(translator, logEvent, asyncLoggerConfig);
        }
    } else {
        disruptor.getRingBuffer().publishEvent(translator, logEvent, asyncLoggerConfig);
    }
}

The RingBuffer grew to 1.61 GB and was filled mainly with WARN and ERROR events containing large stack traces.

Configuration Insights

log4j2.asyncQueueFullPolicy

determines how to handle a full queue; Discard drops events. log4j2.discardThreshold sets the highest level to discard (default INFO, recommended ERROR).

Performance Tests

Four load‑tests were run while varying log4j2.discardThreshold:

INFO – service availability recovered only after client restart.

WARN – availability returned to ~95 % in 3 min, 100 % in 8 min.

ERROR – full recovery within ~2 min without restart.

FATAL – similar to ERROR.

Asynchronous Logging Mechanism

A simplified diagram shows Log4j2 using the Disruptor RingBuffer to decouple producers and consumers.

Recommendations

Use log4j2.asyncQueueFullPolicy=Discard together with log4j2.discardThreshold=ERROR in production.

Avoid Appenders that interact with external middleware (e.g., KafkaAppender) unless necessary; if used, set syncSend=false for Log4j2 2.8+.

Set immediateFlush=false to batch writes and reduce disk I/O.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Testing service reliability log4j2 Java backend asynchronous logging

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.