How to Build a Scalable Container Log Architecture with S6 and Filebeat
This guide explains Docker container logging pitfalls, compares Kubernetes log collection levels, and presents a unified log‑aggregation architecture using S6‑based images, Filebeat, logrotate, Kafka, and Logstash, with practical steps for dynamic configuration and rotation.
Container Log Basics
Docker stores container stdout/stderr in JSON files located at
/var/lib/docker/containers/<container_id>/<container_id>-json.log. The default json-file driver does not limit file size, so logs can grow without bound and exhaust disk space. The Docker daemon also reads these files; when log volume is high the daemon becomes a bottleneck and commands such as docker logs -f can block the daemon, making docker ps unresponsive.
Benchmark results show syslog driver throughput of ~14.9 MB/s versus json-file at ~37.9 MB/s. To avoid daemon collection, the article uses an S6‑based image where s6-log redirects the container’s CMD stdout to a file on the host, bypassing the daemon.
Kubernetes Log Levels
Log collection in Kubernetes can be organized into three levels:
Pod (application) level : Applications write to stdout/stderr; logs are accessed with kubectl logs.
Node level : Configure a container log driver (e.g., json-file) together with logrotate to rotate files when they exceed a size limit.
Cluster level :
Node‑side DaemonSet that runs a lightweight collector on every node, handling only stdout logs.
Sidecar container per pod that either streams logs to stdout (creating duplicate files) or runs a dedicated collector such as Logstash or Fluentd inside the pod (higher CPU/memory usage and logs are hidden from kubectl logs).
Application‑side push directly to a backend storage service.
Unified Log Architecture
The proposed architecture uses a node‑level DaemonSet ( log‑agent) to collect logs from containers built on the S6 base image. The data flow is:
Application containers write logs to host directories, e.g.
/data/logs/<namespace>/<appname>/<podname>/log/xxxx.log.
The log‑agent pod on each node runs filebeat and logrotate. filebeat tails the log files and forwards them to a Kafka topic.
Kafka feeds the logs into Elasticsearch; logstash creates indices and processes the data for Kibana visualization.
Key challenges addressed:
Dynamic update of filebeat configuration when new applications are deployed.
Ensuring every log file is rotated according to policy.
Extending filebeat with custom plugins for additional functionality.
Practical Implementation
To solve the challenges, the article recommends:
Use the fsnotify library (https://github.com/fsnotify/fsnotify) to watch the log directory for create/delete events and render new filebeat configuration files from templates.
Schedule log rotation with a cron job using the cron library (https://github.com/robfig/cron). The cron job runs logrotate with a configuration such as:
/var/log/xxxx/xxxxx.log {
su www-data www-data
missingok
notifempty
size 1G
copytruncate
}For custom filebeat development, refer to the official Filebeat repository: https://github.com/elastic/beats/tree/master/filebeat and the S6 project page: http://skarnet.org/software/s6/.
Summary
The solution provides a lightweight, scalable log collection pipeline for Kubernetes:
S6‑based containers redirect stdout to host files, eliminating daemon bottlenecks.
A node‑level DaemonSet runs filebeat + logrotate to collect and rotate logs.
Logs are streamed to Kafka, then to Elasticsearch via Logstash, and visualized in Kibana.
Reference links:
Kubernetes logging documentation: https://kubernetes.io/docs/concepts/cluster-administration/logging/
Understanding logrotate: https://support.rackspace.com/how-to/understanding-logrotate-utility/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
