Mastering Kubernetes Logging: Practical Tips for Levels, Formats, and Performance
This article provides a hands‑on guide to building a reliable Kubernetes logging system, covering log level selection, content standards, output formats, volume control, multiple targets, performance impact, library choices, storage options, and long‑term retention strategies.
The author shares years of experience building a logging system for Kubernetes, aiming to help readers avoid common pitfalls and establish a practical, standardized logging pipeline.
1. How to Choose Log Levels
Kubernetes applications should use six standard log levels, each indicating severity:
FATAL – critical errors requiring immediate human intervention.
ERROR – unexpected errors that may affect parts of the system but not core functionality.
WARN – potentially dangerous conditions worth attention.
INFO – detailed execution flow for each request.
DEBUG – verbose debugging information; should be disabled in production.
TRACE – the most granular trace data, often including payloads.
Practical advice includes using FATAL only for unrecoverable errors, treating ERROR as alert‑worthy while WARN can be non‑alerting, limiting production logs to INFO or WARN, enabling DEBUG temporarily for troubleshooting, and ensuring the logging library can change levels at runtime.
2. Log Content Standards
Every log entry should contain at least Time , Level , and Location . Additional fields depend on the module or business context, such as:
TraceID when using distributed tracing.
Business identifiers like order ID or user ID.
HTTP request details: URL, Method, Status, Latency, Inflow, OutFlow, ClientIP, UserAgent.
Module name when multiple components share the same log stream.
These conventions should be enforced by the operations platform to ensure uniformity.
3. Log Representation
Key‑Value pairs are recommended for easy parsing, e.g.:
[2019-12-30 21:45:30.611992] [WARNING] [958] [block_writer.cpp:671] path:pangu:/localcluster/index/3/prom/7/1577711464522767696_0_1577711517 min_time:1577712000000000 max_time:1577715600000000 normal_count:27595 config:prom start_line:57315569 end_line:57343195 latency(ms):42 type:AddBlockJSON is also acceptable and widely supported by log collectors:
{"addr":"tcp://0.0.0.0:10010","caller":"main.go:98","err":"listen tcp: address tcp://0.0.0.0:10010: too many colons in address","level":"error","msg":"Failed to listen","ts":"2019-03-08T10:02:47.469421Z"}Avoid binary or protobuf formats for most scenarios.
4. Single‑Line Log Entries
Do not split a single logical log into multiple lines; multi‑line logs increase collection, parsing, and indexing costs.
5. Controlling Log Output Volume
Excessive logs waste disk space and CPU. Recommendations:
Collect request/response logs for every entry point unless a special reason excludes them.
Print error logs; if they become noisy, apply sampling.
Minimize logs inside tight loops.
Limit ingress/Nginx access logs to ≤5 MB/s (≈500 B per line, ≤10 k lines/s) and application logs to ≤200 KB/s (≈2 KB per line, ≤100 lines/s).
6. Multiple Log Output Targets
Separate different log types into distinct files to simplify collection and monitoring:
Access logs per domain.
Error logs with dedicated alerting.
External‑system call logs for audit.
Middleware logs usually managed by a unified platform.
7. Controlling Log Performance Overhead
Logging must not degrade business performance. Test the logging library so that its CPU consumption stays below 5 % of total usage, and ensure logging is asynchronous to avoid blocking the main workflow.
8. Choosing a Log Library
Popular, stable libraries per language include:
Java – Log4J, LogBack.
Go – go‑kit.
Python – built‑in logging (refer to the CookBook).
C++ – spdlog (high‑performance, cross‑platform).
9. Log Shape: File vs. Stdout
Containers typically write to stdout / stderr, which Docker captures. This works for simple system components but not for complex services with multiple layers; mixing everything into stdout makes separation difficult and can consume a full CPU core at 100 k logs/s.
10. Persistence and Storage Medium
Logs can be sent directly to a centralized system without persisting to disk, which reduces latency but is suitable only for very high‑volume scenarios. For most cases, write to local storage (HostVolume or EmptyDir) to provide a buffer for network failures and enable direct file inspection when the log system is unavailable.
11. Ensuring Log Retention
Kubernetes dynamically creates and destroys nodes and containers, causing logs to disappear. To retain logs for DevOps, audit, or compliance, centralize collection so that logs are captured within seconds and stored independently of the lifecycle of the originating pod.
In summary, adopting a unified logging specification across teams ensures that downstream collection, analysis, monitoring, and visualization can operate smoothly and reliably.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
