Cloud Native 14 min read

How to Build a Scalable Cloud‑Native Log Collection System with Filebeat and Custom Controllers

This article explains the challenges of container log collection in Kubernetes, evaluates log‑agent options, details the design of a custom Filebeat‑based controller architecture, shares performance tuning with Golang, and outlines monitoring and future extensions for a robust cloud‑native logging solution.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
How to Build a Scalable Cloud‑Native Log Collection System with Filebeat and Custom Controllers

1. Background

The cloud‑native wave has arrived, prompting rapid technical change. NetEase launched the Lightboat microservice cloud platform, integrating microservices, ServiceMesh, container cloud, and DevOps, which is widely used internally and supports many external customers' cloud‑native migration.

Logs are often overlooked but are a crucial part of microservices and DevOps; without logs, troubleshooting is impossible, and unified log collection underpins many business data analysis, processing, and audit tasks. In cloud‑native container environments, log collection becomes more complex.

2. Pain Points of Container Log Collection

Traditional Host Mode

For services deployed on physical or virtual machines, log collection is straightforward: logs are written to the host, an agent is installed on each node, configured, and optionally managed via a configuration center.

Kubernetes Environment

In Kubernetes, many containers run on a single node, and logs can be stored as stdout, hostPath, emptyDir, PV, etc. Pods are frequently created and destroyed, making per‑service manual configuration infeasible. Log queries need to filter by namespace, pod, container, node, and even environment variables or labels. Traditional methods cannot sense Kubernetes, so they fail to integrate.

Kubernetes supports custom resources (CRD) and controllers. Users can define resources and develop controllers to turn expectations into reality. For log collection, a controller can generate the required configuration based on user expectations.

3. Exploration and Architecture Design

Log Collection Agent Selection

Logstash runs on the JVM and consumes hundreds of megabytes to gigabytes of memory, so it was excluded.

Fluentd has many plugins but is written in Ruby and C, which does not match our Go‑centric stack.

Loki is new and currently has limited functionality and sub‑optimal performance.

Filebeat, written in Go, is lightweight, fits our stack, and performed best in our tests, becoming the primary choice.

Agent Integration Method

Two deployment patterns are common in Kubernetes:

Sidecar: a Filebeat container runs in the same Pod as the business container, collecting only that container's logs. This isolates services but adds memory overhead for each Pod.

DaemonSet: a Filebeat container runs on each node, consuming less memory and being non‑intrusive. This approach was preferred.

Overall Architecture

With Filebeat as the log‑collection agent and a custom log controller named Ripple, the architecture from the node’s perspective is:

The log platform creates CRD instances in the Kubernetes cluster; Ripple watches Pods and CRDs.

Ripple aggregates information and generates a Filebeat input configuration, including paths, multiline patterns, and metadata such as PodName and Hostname.

Filebeat reloads the configuration, collects logs, and forwards them to Kafka or Elasticsearch.

Ripple detects Pod lifecycle events, automatically updates Filebeat configurations, supports various volume types (stdout, hostPath, emptyDir, PV), adds metadata from environment variables, labels, and annotations, and includes log cleanup to prevent loss.

4. Practice Based on Filebeat

Feature Extensions

Filebeat can be extended via custom output or processor plugins. Three approaches are available:

Fork Filebeat and develop directly in the source tree.

Copy Filebeat’s main.go, import custom plugins, and recompile.

Use Go’s plugin mechanism to compile plugins as .so libraries (less stable, not recommended).

We have added a gRPC output and support for multiple Kafka clusters.

Three‑Dimensional Monitoring

Integrate with Lightboat monitoring for disk I/O, network traffic, memory, CPU, and Pod event alerts.

Implement end‑to‑end latency monitoring for the log platform.

Collect Filebeat’s own logs to track start and end of log collection without SSH.

Develop a custom Filebeat exporter for Prometheus to expose metrics.

These monitoring enhancements greatly simplify troubleshooting and reduce operational costs.

5. Golang Performance Optimization and Tuning

Golang is a core language for cloud‑native projects. We share practical experience with go benchmark, go pprof, and go trace, focusing on sync.Pool to reduce allocations and GC pressure. Benchmarks show memory usage dropping from over 5 GB to about 160 MB and GC count decreasing from 170 to 5 in the same time window.

6. Summary and Outlook

In the cloud‑native era, logs are the foundation of observability and the starting point for downstream big‑data analysis. Although many open‑source projects exist, no single universal log‑collection agent dominates. Our custom Ripple controller is designed to be extensible and will support additional agents in the future, aiming for a richer, more robust cloud‑native logging system.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud-nativeGolangKuberneteslog collectionFilebeat
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.