Designing a Scalable Kubernetes Log Collection System Using S6 and Filebeat
This article explains the limitations of Docker‑based logging, compares logging drivers, and presents a Kubernetes‑wide log collection architecture that uses an S6‑based base image, Filebeat, logrotate, Kafka, and Elasticsearch to achieve reliable, scalable log aggregation.
Background and Problems with Docker Logging
Docker generates two kinds of logs: engine logs (handled by the host’s system logger) and container logs, which are written to
/var/lib/docker/containers/<container_id>/<container_id>-json.login JSON format. In production this approach has three major drawbacks: unlimited log file growth, Docker daemon becoming a bottleneck for high‑volume logs, and blocking of docker logs -f and other CLI commands.
Docker’s logging drivers provide different performance characteristics, e.g.:
log-driver speed
syslog 14.9 MB/s
json-file 37.9 MB/sTo avoid the daemon bottleneck, the article proposes redirecting container stdout/stderr directly to host files using the S6‑log utility in a custom base image.
Kubernetes Log Collection Levels
Kubernetes log collection can be organized at three levels:
Pod (application) level
Node level
Cluster level
Pod Level
Pods write logs to stdout/stderr, which can be accessed with kubectl logs pod-name -n namespace.
Node Level
Node‑level logging configures a Docker log-driver together with logrotate to automatically rotate large log files.
Cluster Level
Three common cluster‑wide approaches are described:
Node‑agent (DaemonSet) : Deploy a log‑agent DaemonSet on every node; low resource usage and non‑intrusive to applications, but requires all containers to log to stdout.
Sidecar container : Run a logging sidecar in each pod. Two variants exist:
Streaming sidecar that forwards the application’s stdout/stderr, resulting in duplicate log files on the host.
Dedicated log‑collector sidecar (e.g., Logstash or Fluent Bit) that writes logs to a backend, consuming more CPU/memory and hiding logs from kubectl logs.
Application‑direct logging : Applications push logs directly to a storage backend (e.g., Elasticsearch, Loki) without using stdout.
Proposed Unified Log Architecture
The recommended architecture combines the node‑agent approach with a custom log‑agent container built from an S6 base image. The flow is:
All application containers use the S6 base image; logs are redirected to host directories such as /data/logs/namespace/appname/podname/log/xxxx.log.
The log‑agent runs Filebeat and logrotate; Filebeat watches the log files and ships them to a Kafka topic.
Kafka forwards logs to Elasticsearch; Logstash consumes Kafka messages, creates indices, and stores logs for Kibana visualization.
Implementation Challenges
Automatically updating Filebeat configuration when new applications are deployed.
Ensuring each log file is rotated correctly.
Extending Filebeat with custom modules for additional functionality.
Practical Solutions
To address the challenges, the article suggests building a log‑agent DaemonSet that includes:
Use of github.com/fsnotify/fsnotify to watch log directories for create/delete events and regenerate Filebeat config via templating.
Use of github.com/robfig/cron to schedule periodic logrotate jobs. Example logrotate snippet:
/var/log/xxxx/xxxxx.log {
su www-data www-data
missingok
notifempty
size 1G
copytruncate
}Conclusion
The article provides a practical blueprint for Kubernetes log collection, emphasizing a node‑agent architecture with S6‑based containers, Filebeat, Kafka, and Elasticsearch. Organizations can adapt the design to their specific requirements and extend it as needed.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
