Evolution of Log Collection at ZuanZuan: From Bare Metal to Cloud‑Native Era
This article traces ZuanZuan's log‑collection journey from the bare‑metal era through containerization to a cloud‑native solution, detailing the challenges, custom developments like log‑pilot, ByteCompass, and fb‑advisor, and the performance gains achieved with each transition.
1 Bare‑Metal Era
In the bare‑metal era, ZuanZuan's log collection relied on a custom scribe + flume stack developed by the big‑data team. Adding a new service required an operations ticket, automatic deployment of scribe and flume, configuration rendering, and forwarding logs to Kafka or HDFS.
Operations submits a ticket to collect logs for service A .
Ticket is approved.
Automation deploys scribe + flume on the host.
Configuration files are rendered to point to the service's log directory.
Scribe reads the logs and sends them to flume, which forwards them to downstream systems.
The deployment nodes were stable, so the workflow ran smoothly.
2 Container Era
Starting in 2018, ZuanZuan moved to containers while keeping the existing release system and log‑processing approach unchanged.
Service deployment and login remain transparent to business users.
Log processing follows the same pattern as the bare‑metal era.
The article focuses on the log‑collection system during this transition.
2.1 log‑pilot + flume Custom Development
Using log‑pilot’s container auto‑discovery, a flume configuration is generated to collect container logs, store them on the host, and update the scribe configuration.
As container adoption grew, log volume exploded, causing disk pressure, shortened retention, and high iowait.
2.2 ByteCompass
ByteCompass, a self‑developed system managed by systemd, replaces log‑pilot + flume, watching Docker events, extracting log metadata, rendering new scribe.conf, and restarting scribe.
ByteCompass reduced average iowait from 10% to 1% and increased local log retention from 3 days to over 7 days.
3 Cloud‑Native Era
After solving disk bottlenecks, ZuanZuan accelerated containerization and began designing a cloud‑native log‑collection system.
The new requirement is full‑volume log centralization and searchable storage, independent of pod lifecycle.
Because logs are mounted via hostPath, a custom Filebeat helper called fb-advisor watches the kube‑apiserver for pod events, extracts hostPath locations, and writes them to Filebeat’s config.d directory for automatic reload.
Collected logs are sent to Kafka and processed by a custom consumer, replacing the scribe + flume “golden combo”.
3.2 Generic HostPath Solution
The generic solution uses Filebeat’s add_kubernetes_metadata processor to attach pod metadata to log entries.
processors:
- add_kubernetes_metadata:
in_cluster: false
host: 10.140.24.108
kube_config: /pathto/kubeconfig
namespace: default
default_indexers.enabled: false
default_matchers.enabled: false
sync_period: 60m
indexers:
- pod_uid:
matchers:
- logs_path:
logs_path: '/var/lib/kubelet/pods/'
resource_type: 'pod'This processor connects to the kube‑apiserver, extracts the pod UID from the hostPath, and enriches each log line with the corresponding metadata.
3.3 Comparison
The custom ZuanZuan solution offers higher configurability, while the generic approach automatically attaches all pod labels.
From a security perspective, the ZuanZuan method avoids reliance on pod‑specific directories, reducing the risk of log loss after pod rescheduling.
4 Summary
Cloud‑native log collection has many viable options; the best choice depends on specific operational characteristics. ZuanZuan progressed from scribe + flume, through log‑pilot + flume, ByteCompass, to the current Filebeat + fb‑advisor stack, fully embracing cloud‑native practices.
The article provides a high‑level workflow overview; detailed internal implementations are omitted.
Zhuanzhuan Tech
A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.