Solving Cloud‑Native Log Collection: NetEase Lightboat’s Architecture & Insights
This article explains how NetEase’s Lightboat micro‑service platform tackles the challenges of log collection in cloud‑native Kubernetes environments by designing a custom controller, selecting Filebeat as the agent, integrating it via DaemonSet, extending its functionality, and applying Golang performance‑tuning techniques to achieve efficient, observable logging.
Background
The wave of cloud‑native technologies is reshaping the industry, and NetEase has launched the Lightboat micro‑service platform, which integrates micro‑services, Service Mesh, container cloud, and DevOps components. Logging, often overlooked, is essential for troubleshooting, data analysis, and auditing in micro‑service and DevOps pipelines, but containerized environments introduce new complexities.
Container Log Collection Pain Points
Traditional Host Mode
On physical or virtual machines, logs are written to the host, and agents can be installed manually or via automation to collect them. Configuration can be centralized through a config center.
Kubernetes Environment
In Kubernetes, many containers run on a node, each with different log storage types (stdout, hostPath, emptyDir, PV). Pods are frequently created, destroyed, or migrated, making static agent configuration impossible. Log queries need to filter by Namespace, Pod, Container, Node, and metadata such as labels and annotations. Traditional log collection tools are not Kubernetes‑aware.
Exploration and Architecture Design
Log‑Collection Agent Selection
Several agents were evaluated: Logstash (high memory usage, discarded), Fluentd (Ruby/C based, not a perfect fit), Loki (new, limited features), and Filebeat (lightweight Go implementation, aligns with Lightboat’s stack). Filebeat was chosen as the primary agent.
Agent Integration Methods
Two deployment models are common: a sidecar container per pod (isolated but memory‑heavy) and a DaemonSet that runs one Filebeat per node (low memory, non‑intrusive). The DaemonSet approach was preferred.
Overall Architecture
The architecture consists of a custom Log Collection controller (Ripple) that watches Kubernetes Pods and CRD instances, generates Filebeat input configurations with metadata (PodName, Hostname, labels, etc.), and reloads Filebeat. Filebeat then ships logs to Kafka or Elasticsearch. Ripple also handles log cleanup and ensures continuous configuration updates as Pods change.
Filebeat‑Based Practice
Feature Extensions
Filebeat’s default outputs (Elasticsearch, Kafka, Logstash) were insufficient, so custom outputs and processors were developed. Three extension methods were described: fork the Filebeat source and add plugins, copy and modify the main.go, or use Go’s plugin system (the latter is less stable). Custom gRPC and multi‑Kafka outputs were added.
Three‑Dimensional Monitoring
Comprehensive monitoring was built: integration with Lightboat’s monitoring platform for disk I/O, network, memory, CPU, and pod events; end‑to‑end log pipeline latency tracking; collection of Filebeat’s own logs to trace file collection status; and a Filebeat exporter exposing metrics to Prometheus.
Golang Performance Optimization and Tuning
Performance tuning tools such as go benchmark, go pprof, and go trace were applied to the controller’s template rendering. Using sync.Pool to reuse temporary objects reduced memory allocation from >5 GB to ~160 MB and cut GC cycles dramatically, demonstrating the impact of Go‑level optimizations.
Summary and Outlook
In the cloud‑native era, logs are the foundation of observability and the starting point for downstream big‑data analysis. While many open‑source log agents exist, no single solution dominates. Lightboat’s Ripple controller abstracts log collection, allowing future support for additional agents and further strengthening a robust, extensible cloud‑native logging system.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
