Log Collection Architecture Using Filebeat, Logstash, and Kafka
This article describes a lightweight, resource‑efficient log collection solution that combines Filebeat agents, optional Logstash aggregation, and Kafka transport, detailing configuration choices, meta‑persistence, back‑pressure mechanisms, monitoring setup, and deployment architecture for reliable at‑least‑once delivery.
Log collection is a common requirement for gathering service logs and event data for offline ETL or real‑time analysis. Open‑source tools such as Flume, Logstash, and Filebeat are available, and the 360 commercial team primarily uses Filebeat for agent‑side collection and Logstash as an optional aggregation layer.
Technology selection : Because the ad‑delivery engine is sensitive to resource usage, Filebeat was chosen over JVM‑based tools for its lower footprint. Logstash is used as the log‑aggregation layer when needed, benefiting from seamless integration with Filebeat as both are Elastic products.
Filebeat version 7.14 (stable) is used, supporting Kafka SASL‑SCRAM authentication, while Logstash also runs version 7.14. The current pipeline uses Filebeat’s log input and Kafka output, and the modular design allows other inputs/outputs to be added easily.
Filebeat fundamentals : Filebeat relies on the libbeat library, which provides a publisher to connect inputs, a processor chain for event enrichment, and a retry/ack mechanism that guarantees log collection integrity.
Key configuration items include:
Processors (e.g., add_fields, add_labels, rename, drop_event)
Inputs (definition of log sources)
Modules (pre‑built parsers and Kibana dashboards for popular services)
HTTP endpoint for metrics
Meta‑persistence mechanism : After Filebeat starts, it creates data/registry/filebeat/log.json that records each file’s identifier, source path, offset, timestamps, and inode/device information. This file enables the system to know which logs are completed, in‑progress, or need re‑collection, and supports at‑least‑once semantics.
Back‑pressure mechanism : When outputs (e.g., Kafka) fail, retries fill the internal queue; once the queue is full, inputs stop collecting new logs, preventing data loss and ensuring safe local storage until downstream services recover.
Design goals :
Unified operations monitoring to reduce OPEX
Support both real‑time and offline log consumption
Pipeline back‑pressure and at‑least‑once guarantees
Cross‑IDC disaster recovery with dynamic agent reconfiguration
Design方案 :
One‑click agent deployment script and an agent manager for metric reporting and dynamic configuration
Kafka as the unified log transport layer
Logstash as an aggregation layer to offload Kafka connections when many agents are present
Metrics exposed via HTTP, scraped by Prometheus, visualized in Grafana
Integration into the Ultron platform for project‑level management
Architecture layers :
Agent layer – Filebeat with an accompanying manager
Aggregation layer – optional Logstash
DataBus layer – Kafka clusters in each IDC for cross‑IDC resilience
Heterogeneous transport layer – custom Hamal2 framework (based on Flink) for real‑time ingestion into HDFS/Hive, Spark, Flink, etc.
Filebeat → Kafka (direct) configuration example:
filebeat.inputs:
- type: log
enabled: true
paths:
- /root/test*.log
fields:
topic_name: test
- type: log
enabled: true
paths:
- /root/test2*.log
fields:
topic_name: test2
output.kafka:
version: 2.0.0
hosts: ["xxx:9092", "xxx:9092", "xxx:9092"]
topic: '%{[fields.topic_name]}'
partition.round_robin:
reachable_only: false
required_acks: 1
compression: lz4
max_message_bytes: 10000000
sasl.mechanism: SCRAM-SHA-256
username: xxx
password: xxx
worker: 1
refresh_frequency: 5m
codec.format:
string: '%{[message]}'
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
http.enabled: true
http.host: your host
http.port: 5066This configuration defines two log inputs, assigns topics via fields, and sends logs to Kafka using SCRAM‑SHA‑256 authentication. The codec.format option customizes the output format.
Filebeat → Logstash → Kafka configuration example (Filebeat side):
filebeat.inputs:
- type: log
enabled: true
paths:
- /root/test*.log
fields:
topic_name: test
kafka_cluster: cluster1
- type: log
enabled: true
paths:
- /root/test2*.log
fields:
topic_name: test2
kafka_cluster: cluster2
output.logstash:
hosts: ["logstash.k8s.domain:5044"]
http.enabled: true
http.host: your host
http.port: 5066Logstash configuration enables persistent queues ( queue.type: persisted ) and defines back‑pressure handling. The pipeline filters logs with a Grok pattern and routes them to the appropriate Kafka cluster based on the kafka_cluster field.
queue.type: persisted
queue.max_bytes: 8gb
queue.max_events: 3000000
path.queue: /path/to/queuefile input {
beats {
port => "5044"
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
}
output {
stdout { codec => rubydebug { metadata => true } }
if [fields][kafka_cluster] == "xxx" {
kafka {
codec => plain { format => '{ message:"%{message}"}' }
bootstrap_servers => "xxx:9092,xxx:9092,xxx:9092"
topic_id => "%{[fields][topic_name]}"
compression_type => "lz4"
}
}
}Both Filebeat and Logstash provide persistent storage and back‑pressure, ensuring at‑least‑once delivery even under failure conditions.
Monitoring : The free Filebeat version only exposes metrics via an HTTP endpoint, which must be scraped, parsed, and pushed to a Pushgateway for Prometheus‑Grafana visualization. Logstash monitoring is similarly commercial‑only.
Conclusion : Filebeat agents are lightweight, highly extensible, and resource‑efficient; Logstash adds a scalable aggregation layer with persistent queues and back‑pressure; together they achieve reliable, at‑least‑once log collection across multiple data centers.
360 Tech Engineering
Official tech channel of 360, building the most professional technology aggregation platform for the brand.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.