Cloud Native 13 min read

Solving Cloud‑Native Log Collection: NetEase Lightboat’s Architecture & Insights

This article explains how NetEase’s Lightboat micro‑service platform tackles the challenges of log collection in cloud‑native Kubernetes environments by designing a custom controller, selecting Filebeat as the agent, integrating it via DaemonSet, extending its functionality, and applying Golang performance‑tuning techniques to achieve efficient, observable logging.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Solving Cloud‑Native Log Collection: NetEase Lightboat’s Architecture & Insights

Background

The wave of cloud‑native technologies is reshaping the industry, and NetEase has launched the Lightboat micro‑service platform, which integrates micro‑services, Service Mesh, container cloud, and DevOps components. Logging, often overlooked, is essential for troubleshooting, data analysis, and auditing in micro‑service and DevOps pipelines, but containerized environments introduce new complexities.

Container Log Collection Pain Points

Traditional Host Mode

On physical or virtual machines, logs are written to the host, and agents can be installed manually or via automation to collect them. Configuration can be centralized through a config center.

Kubernetes Environment

In Kubernetes, many containers run on a node, each with different log storage types (stdout, hostPath, emptyDir, PV). Pods are frequently created, destroyed, or migrated, making static agent configuration impossible. Log queries need to filter by Namespace, Pod, Container, Node, and metadata such as labels and annotations. Traditional log collection tools are not Kubernetes‑aware.

Exploration and Architecture Design

Log‑Collection Agent Selection

Several agents were evaluated: Logstash (high memory usage, discarded), Fluentd (Ruby/C based, not a perfect fit), Loki (new, limited features), and Filebeat (lightweight Go implementation, aligns with Lightboat’s stack). Filebeat was chosen as the primary agent.

Agent Integration Methods

Two deployment models are common: a sidecar container per pod (isolated but memory‑heavy) and a DaemonSet that runs one Filebeat per node (low memory, non‑intrusive). The DaemonSet approach was preferred.

Overall Architecture

The architecture consists of a custom Log Collection controller (Ripple) that watches Kubernetes Pods and CRD instances, generates Filebeat input configurations with metadata (PodName, Hostname, labels, etc.), and reloads Filebeat. Filebeat then ships logs to Kafka or Elasticsearch. Ripple also handles log cleanup and ensures continuous configuration updates as Pods change.

Filebeat‑Based Practice

Feature Extensions

Filebeat’s default outputs (Elasticsearch, Kafka, Logstash) were insufficient, so custom outputs and processors were developed. Three extension methods were described: fork the Filebeat source and add plugins, copy and modify the main.go, or use Go’s plugin system (the latter is less stable). Custom gRPC and multi‑Kafka outputs were added.

Three‑Dimensional Monitoring

Comprehensive monitoring was built: integration with Lightboat’s monitoring platform for disk I/O, network, memory, CPU, and pod events; end‑to‑end log pipeline latency tracking; collection of Filebeat’s own logs to trace file collection status; and a Filebeat exporter exposing metrics to Prometheus.

Golang Performance Optimization and Tuning

Performance tuning tools such as go benchmark, go pprof, and go trace were applied to the controller’s template rendering. Using sync.Pool to reuse temporary objects reduced memory allocation from >5 GB to ~160 MB and cut GC cycles dramatically, demonstrating the impact of Go‑level optimizations.

Summary and Outlook

In the cloud‑native era, logs are the foundation of observability and the starting point for downstream big‑data analysis. While many open‑source log agents exist, no single solution dominates. Lightboat’s Ripple controller abstracts log collection, allowing future support for additional agents and further strengthening a robust, extensible cloud‑native logging system.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performance optimizationcloud-nativeGolangKuberneteslog collectionFilebeat
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.