Building a Scalable Container Log Collection System with S6 and Filebeat
This article explains how to design and implement a unified log collection architecture for Docker containers and Kubernetes clusters using S6‑based images, Filebeat, logrotate, Kafka, Logstash, and Elasticsearch, addressing common challenges such as log rotation, daemon bottlenecks, and dynamic configuration.
Preparation
About Container Logs
Docker logs are divided into engine logs and container logs. Engine logs are handled by the system logger and stored in OS‑specific locations.
The article focuses on container logs, which are the output of applications running inside containers. By default, the command
docker logsshows the STDOUT and STDERR of a running container, and the logs are stored in JSON‑file format at
/var/lib/docker/containers/<container_id>/<container_id>-json.log. This approach is unsuitable for production.
Without size limits, container logs can fill the disk, causing system issues (log‑driver supports rotation).
Docker daemon collects container stdout; excessive log volume makes the daemon a bottleneck.
Using
docker logs -fto follow logs can block the daemon, causing commands like
docker psto become unresponsive.
Docker provides configurable logging drivers; however, they still rely on the daemon for collection, so the speed bottleneck remains.
log‑driver collection speed syslog 14.9 MB/s json‑file 37.9 MB/s
To avoid daemon collection, one can redirect logs directly to files with automatic rotation using an S6‑based image.
S6‑log redirects the CMD's stdout to
/…/default/currentinstead of sending it to the Docker daemon, eliminating the daemon bottleneck.
About k8s Logs
Kubernetes log collection is divided into three levels:
Application (Pod) level
Node level
Cluster level
At the Pod level, logs are output to stdout/stderr and can be viewed with
kubectl logs pod-name -n namespace.
At the Node level, logs are managed via container log‑drivers, often combined with
logrotatefor automatic rotation.
At the Cluster level, three approaches are common:
Node‑agent method : Deploy a DaemonSet on each node that runs a log‑agent (e.g., Filebeat) to collect logs locally. This method uses few resources and is non‑intrusive but only works for standard‑output logs.
Sidecar container as a log proxy: Each pod runs an additional container that forwards logs. One variant streams logs to stdout, creating duplicate log files on the host; the other runs a full log collector (e.g., Logstash, Fluentd) inside the pod, which consumes more resources and hides logs from
kubectl logs.
Application‑direct logging: The application itself pushes logs to a backend service.
Log Architecture
A unified log collection system can be built using the node‑agent approach. The overall architecture is:
All application containers are built from an S6‑based image; their logs are redirected to host files such as
/data/logs/namespace/appname/podname/log/xxxx.log.
A log‑agent container on each node runs Filebeat, logrotate, etc., to collect these files.
Filebeat forwards the logs to Kafka.
Kafka delivers logs to Elasticsearch for storage and Kibana for querying.
Logstash consumes Kafka messages and creates indices in Elasticsearch.
Key challenges to address:
Dynamically updating Filebeat configuration for newly deployed applications.
Ensuring every log file is properly rotated.
Extending Filebeat with custom features when needed.
Implementation
To solve the challenges, develop a
log‑agentDaemonSet that runs on every node and includes Filebeat, logrotate, and custom components.
For dynamic Filebeat configuration, use the
fsnotifylibrary to watch the log directory for create/delete events and render configuration templates accordingly.
For log rotation, use the
cronlibrary to schedule periodic rotation jobs, taking care of file ownership (e.g., non‑root users).
<code>/var/log/xxxx/xxxxx.log {
su www-data www-data
missingok
notifempty
size 1G
copytruncate
}</code>For extending Filebeat, refer to community guides and source code.
Conclusion
This article provides a simple reference design for Kubernetes log collection; actual implementations should be tailored to specific organizational requirements.
References
Kubernetes Logging Documentation
Understanding logrotate Utility
Filebeat Repository
S6 Process Supervision Suite
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.