10 Essential Logging Rules for Reliable Backend Systems
This article shares ten practical guidelines for writing high‑quality logs in Java backend services, covering format consistency, stack traces, log levels, complete parameters, data masking, asynchronous logging, traceability, dynamic log levels, structured storage, and intelligent monitoring to improve debugging and system reliability.
Introduction
During a Double‑11 promotion the author saw chaotic error logs and realized that poorly written logs are like a doctor’s incomplete medical records.
Rule 1: Consistent Format
Bad example:
log.info("start process"); log.error("error happen");– missing timestamp and context.
<!-- logback.xml core pattern -->
<pattern>%d{yy-MM-dd HH:mm:ss.SSS}|%X{traceId:-NO_ID}|%thread|%-5level|%logger{36}|%msg%n</pattern>Configuring a unified pattern in
logback.xmladds timestamp, traceId, thread, level, logger and message.
Rule 2: Include Stack Trace
Bad example: catching an exception and logging only a message loses the stack trace.
try {
processOrder();
} catch (Exception e) {
log.error("处理失败");
}Correct logging records the order ID and the exception stack:
log.error("订单处理异常 orderId={}", orderId, e); // e must be presentRule 3: Reasonable Levels
Bad example: using DEBUG for business errors or ERROR for slow responses.
FATAL : System about to crash (OOM, disk full)
ERROR : Core business failure (payment failure, order creation error)
WARN : Recoverable exception (retry succeeded, degradation triggered)
INFO : Key process node (order status change)
DEBUG : Debug information (parameters, intermediate results)
Rule 4: Complete Parameters
Bad example:
log.info("用户登录失败");– no user, IP, or reason.
log.warn("用户登录失败 username={}, clientIP={}, failReason={}", username, clientIP, "密码错误次数超限");This provides full context; timestamp is configured in
logback.xml.
Rule 5: Data Masking
Example of a utility class that masks mobile numbers before logging.
public class LogMasker {
public static String maskMobile(String mobile) {
return mobile.replaceAll("(\\d{3})\\d{4}(\\d{4})", "$1****$2");
}
}
log.info("用户注册 mobile={}", LogMasker.maskMobile("13812345678"));Rule 6: Asynchronous Logging for Performance
Synchronous logging in a flash‑sale caused thread blocking and I/O bottlenecks.
Frequent context switches
Disk I/O became the system bottleneck
Logging consumed 25% of total response time during peak load
Step 1: Async appender in logback.xml
<!-- AsyncAppender core config -->
<appender name="ASYNC" class="ch.qos.logback.classic.AsyncAppender">
<discardingThreshold>0</discardingThreshold>
<queueSize>4096</queueSize>
<appender-ref ref="FILE"/>
</appender>Step 2: Optimized logging code
// No pre‑check needed, framework handles it
log.debug("接收到MQ消息:{}", msg.toSimpleString()); // automatically queued
// Avoid heavy computation before logging (still runs in business thread)
// Wrong: log.debug("详细内容:{}", computeExpensiveLog());Step 3: Capacity formula
maxMemory ≈ queueLength × avgLogSize
recommendedQueue = peakTPS × toleratedDelay(seconds)
// Example: 10000 TPS × 0.5 s tolerance → 5000 queue sizeRule 7: Traceability
Inject a
traceIdvia MDC and include it in the log pattern for end‑to‑end correlation.
// Interceptor injects traceId
MDC.put("traceId", UUID.randomUUID().toString().substring(0,8));
// Log pattern contains traceId
<pattern>%d{HH:mm:ss} |%X{traceId}| %msg%n</pattern>Rule 8: Dynamic Log Level
Expose an endpoint to change logger level at runtime without restarting the service.
@GetMapping("/logLevel")
public String changeLogLevel(@RequestParam String loggerName, @RequestParam String level) {
Logger logger = (Logger) LoggerFactory.getLogger(loggerName);
logger.setLevel(Level.valueOf(level)); // takes effect immediately
return "OK";
}Rule 9: Structured Storage
Store logs as JSON instead of concatenated strings for machine‑friendly querying.
{
"event": "ORDER_CREATE",
"orderId": 1001,
"amount": 8999,
"products": [{"name": "iPhone", "sku": "A123"}]
}Rule 10: Intelligent Monitoring
Integrate the ELK stack with alert rules such as “ERROR > 100 in 5 minutes → phone alarm” and “WARN > 1 hour → email notification”.
ERROR日志连续5分钟 > 100条 → 电话告警
WARN日志持续1小时 → 邮件通知Conclusion
Three developer maturity levels:
Bronze :
System.out.println("error!")Diamond : Standardized logging + ELK monitoring
King : Log‑driven code optimization, anomaly prediction, AI root‑cause analysis
Final question: Can your logs help a newcomer locate a problem within five minutes?
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.