How We Cut Log4j2 Disk and CPU Usage by 90% in a High‑Traffic Shopping Cart
Facing massive log volume in JD.com’s shopping cart service, we reduced disk consumption, CPU load, and improved request latency by applying Log4j2 log level filtering, asynchronous logging with AsyncLogger, and TTL‑based thread‑local context propagation, while providing detailed metrics, configuration steps, and best‑practice recommendations.
1. Reducing Resource Usage
The shopping‑cart gateway originally generated excessive INFO‑level logs, causing high disk pressure and CPU load. By introducing log‑level filtering—using INFO in development, WARN in production, and ERROR during stress tests—the daily disk usage dropped to around 10 GB, and CPU consumption was markedly reduced.
Key practices:
Log level hierarchy: ALL < TRACE < DEBUG < INFO < WARN < ERROR < FATAL < OFF.
Dynamic level adjustment code (shown in the accompanying diagram) allows switching between OFF and INFO without redeploying.
2. Asynchronous Logging
Switching from synchronous logging to Log4j2’s AsyncLogger (global or mixed mode) dramatically lowered the latency impact of logging. Synchronous INFO logging increased request time from 320 ms to 500 ms, while asynchronous INFO logging added negligible overhead.
AsyncLogger works on top of the LMAX Disruptor, using a lock‑free ring buffer to hand off log events between producer threads and a single consumer thread.
Configuration highlights:
Global async: set
-Dlog4j2.contextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelectorat JVM start.
Mixed async: combine synchronous and asynchronous loggers to suit audit vs. high‑throughput scenarios.
3. Reducing CPU Usage
Printing location information (class, file, line) forces Log4j2 to capture stack traces, which incurs heavy CPU cost—up to 30‑100× slower. Disabling location info in PatternLayout and using async logging reduced CPU usage by roughly threefold under 20 GB/h write load.
4. Log Traceability Across Threads
Standard Log4j2 MDC/NDC only propagate within a single thread. To retain trace IDs across thread‑pool tasks, the system adopts TransmittableThreadLocal (TTL) together with a modified thread pool. The TTL snapshot copies the parent thread’s context before task execution and restores it afterward.
Implementation steps:
Replace standard Callable with ParallelCallableTask that captures and restores TTL context.
Configure Log4j2 MDC to store the traceId, enabling end‑to‑end request tracing.
5. Test Metrics and Recommendations
Performance tests on a 4‑core, 8 GB RAM, 50 GB disk server (3 k requests/min) showed:
CPU usage spikes when location info is enabled; disabling it cuts CPU by ~70%.
AsyncLogger reduces logging overhead to near‑zero compared with synchronous logging.
Proper log‑level selection and pattern layout tuning yield the best throughput.
Suggested actions:
Record key runtime parameters for quick issue diagnosis.
Use log‑level filtering to limit output volume.
Configure PatternLayout without location placeholders.
Combine Log4j2 with monitoring tools for proactive alerting.
Adopt async logging for high‑performance requirements and consider TTL‑based thread‑pool refactoring for reliable trace propagation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
