Operations 8 min read

How to Build a Scalable Kubernetes Log Collection System with S6 and Filebeat

This article explains the limitations of Docker's default JSON‑file logging, introduces S6‑based container log redirection, compares Kubernetes logging approaches at pod, node, and cluster levels, and presents a full‑stack architecture using Filebeat, Kafka, Elasticsearch, and Logstash for reliable, rotatable log collection.

Programmer DD
Programmer DD
Programmer DD
How to Build a Scalable Kubernetes Log Collection System with S6 and Filebeat

Container Log Basics

Docker produces two kinds of logs: engine logs (handled by the host system) and container logs (the stdout and stderr of processes inside the container). By default, container logs are stored as JSON files under

/var/lib/docker/containers/<container_id>/<container_id>-json.log

, which is unsuitable for production because the files can grow without limit.

Unlimited log file size can fill the disk.

The Docker daemon becomes a bottleneck when collecting large volumes of logs.

Using docker logs -f can block the daemon, making commands like docker ps unresponsive.

Docker offers configurable logging drivers, but they still rely on the daemon, so the performance bottleneck remains.

Redirecting Logs with S6

By using an S6‑based base image, the container’s stdout is redirected to a file on the host (e.g., /data/logs/.../app.log) instead of the Docker daemon, allowing native log rotation and eliminating the daemon bottleneck.

Kubernetes Logging Levels

Pod (application) level : Logs are written to stdout/stderr and can be viewed with kubectl logs.

Node level : Configure the container’s log driver and use tools like logrotate to manage file size.

Cluster level : Three main approaches:

Node‑agent (DaemonSet) deployed on each node to collect container logs.

Sidecar container that streams logs to stdout (creates duplicate log files on the host).

Sidecar container that runs a log‑collection agent (e.g., Logstash or Filebeat) inside the pod, which consumes more resources and does not expose logs via kubectl logs.

Proposed Log Architecture

All application containers are built from an S6 base image, redirecting logs to host directories such as /data/logs/namespace/appname/podname/log/xxxx.log. A log‑agent DaemonSet runs on each node and includes:

Filebeat for log harvesting.

Logrotate for automatic log rotation.

Configuration to send harvested logs to Kafka.

Kafka forwards logs to Elasticsearch, where Kibana provides search and visualization. Logstash creates indices and consumes Kafka messages.

Open Challenges

Dynamically updating Filebeat configuration when new applications are deployed.

Ensuring every log file is properly rotated.

Extending Filebeat with custom plugins for additional functionality.

Practical Implementation

Deploy a log‑agent as a DaemonSet that bundles Filebeat, Logrotate, and any custom components needed to address the challenges above.

For dynamic Filebeat configuration, watch the log directory with fsnotify and regenerate the config using templates.

Use the cron package to schedule periodic log rotation, e.g.:

/var/log/xxxx/xxxxx.log {
  su www-data www-data
  missingok
  notifempty
  size 1G
  copytruncate
}

For custom Filebeat development, refer to community guides and extend the source as required.

Relevant links:

https://docs.docker.com/v17.09/engine/admin/logging/overview/

http://skarnet.org/software/s6/

DockerKubernetesloggingFilebeatlogrotateS6log architecture
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.