How to Build a Scalable Kubernetes Log Collection System with S6 and Filebeat
This article explains the limitations of Docker's default JSON‑file logging, introduces S6‑based container log redirection, compares Kubernetes logging approaches at pod, node, and cluster levels, and presents a full‑stack architecture using Filebeat, Kafka, Elasticsearch, and Logstash for reliable, rotatable log collection.
Container Log Basics
Docker produces two kinds of logs: engine logs (handled by the host system) and container logs (the stdout and stderr of processes inside the container). By default, container logs are stored as JSON files under
/var/lib/docker/containers/<container_id>/<container_id>-json.log, which is unsuitable for production because the files can grow without limit.
Unlimited log file size can fill the disk.
The Docker daemon becomes a bottleneck when collecting large volumes of logs.
Using docker logs -f can block the daemon, making commands like docker ps unresponsive.
Docker offers configurable logging drivers, but they still rely on the daemon, so the performance bottleneck remains.
Redirecting Logs with S6
By using an S6‑based base image, the container’s stdout is redirected to a file on the host (e.g., /data/logs/.../app.log) instead of the Docker daemon, allowing native log rotation and eliminating the daemon bottleneck.
Kubernetes Logging Levels
Pod (application) level : Logs are written to stdout/stderr and can be viewed with kubectl logs.
Node level : Configure the container’s log driver and use tools like logrotate to manage file size.
Cluster level : Three main approaches:
Node‑agent (DaemonSet) deployed on each node to collect container logs.
Sidecar container that streams logs to stdout (creates duplicate log files on the host).
Sidecar container that runs a log‑collection agent (e.g., Logstash or Filebeat) inside the pod, which consumes more resources and does not expose logs via kubectl logs.
Proposed Log Architecture
All application containers are built from an S6 base image, redirecting logs to host directories such as /data/logs/namespace/appname/podname/log/xxxx.log. A log‑agent DaemonSet runs on each node and includes:
Filebeat for log harvesting.
Logrotate for automatic log rotation.
Configuration to send harvested logs to Kafka.
Kafka forwards logs to Elasticsearch, where Kibana provides search and visualization. Logstash creates indices and consumes Kafka messages.
Open Challenges
Dynamically updating Filebeat configuration when new applications are deployed.
Ensuring every log file is properly rotated.
Extending Filebeat with custom plugins for additional functionality.
Practical Implementation
Deploy a log‑agent as a DaemonSet that bundles Filebeat, Logrotate, and any custom components needed to address the challenges above.
For dynamic Filebeat configuration, watch the log directory with fsnotify and regenerate the config using templates.
Use the cron package to schedule periodic log rotation, e.g.:
/var/log/xxxx/xxxxx.log {
su www-data www-data
missingok
notifempty
size 1G
copytruncate
}For custom Filebeat development, refer to community guides and extend the source as required.
Relevant links:
https://docs.docker.com/v17.09/engine/admin/logging/overview/
http://skarnet.org/software/s6/
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
