Operations 13 min read

Mastering Filebeat: How to Collect and Ship Container Logs to Kafka

This article introduces Filebeat as a lightweight log shipper, explains its core components and processing flow, and provides step‑by‑step configuration examples for gathering container logs and forwarding them to Kafka or Elasticsearch in cloud‑native environments.

Efficient Ops
Efficient Ops
Efficient Ops
Mastering Filebeat: How to Collect and Ship Container Logs to Kafka
Recently, due to the need for cloud‑native log collection, we decided to use Filebeat as a container log collector and extend it; this article outlines Filebeat’s basic usage and principles as an introductory guide.

1. Introduction

There are many open‑source log collectors, and we chose Filebeat for several reasons:

It meets our functional needs: collect disk log files and send them to a Kafka cluster; supports multiline collection and custom fields.

Performance is clearly better than JVM‑based Logstash or Flume.

Filebeat is built on the Go stack, which matches our existing technical expertise for further development.

Deployment is simple with no third‑party dependencies.

2. What Filebeat Can Do

In short, Filebeat is a data mover that can also perform lightweight processing to add business value.

Filebeat accepts data from various inputs; the most common is the log input, which reads log files.

It can process collected data, e.g., multiline merging, adding custom fields, JSON encoding.

Processed data is sent to an output ; we mainly use Elasticsearch and Kafka.

Filebeat implements an ACK feedback mechanism, allowing it to resume from the last position after a restart.

If sending to an output fails, a retry mechanism ensures at‑least‑once delivery semantics.

When the output is blocked, the upstream input throttles collection to match the downstream state.

One diagram to illustrate the flow.

3. The “Core” Behind Filebeat

Filebeat is one member of the Beats family; the core library that powers all Beats is

libbeat

.

libbeat

provides a publisher component to connect inputs.

Collected data passes through processors for filtering, field addition, multiline merging, etc.

The input component pushes data to the publisher’s internal queue.

libbeat

implements various outputs and sends processed data downstream.

It also encapsulates retry logic.

ACK feedback is propagated back to the input.

Thus, most of the work is handled by

libbeat

.

Inputs only need to:

Collect data from a source and hand it to

libbeat

.

Persist ACK feedback received from

libbeat

.

4. Simple Filebeat Usage Example

Filebeat’s usage is straightforward: write the appropriate input and output configurations. Below is an example that collects disk log files and forwards them to a Kafka cluster.

1. Configure the

inputs.d

directory in

filebeat.yml

so each collection path can be placed in a separate file.

<code>filebeat.config.inputs:
  enabled: true
  path: inputs.d/*.yml
</code>

2. Create

test1.yml

under

inputs.d

:

<code>- type: log
  enabled: true
  paths:
    - /home/lw/test/filebeat/*.log
  fields:
    log_topic: lw_filebeat_t_2
</code>

This configuration collects all files matching

/home/lw/test/filebeat/*.log

and adds a custom field

log_topic: lw_filebeat_t_2

.

3. Configure Kafka output in

filebeat.yml

:

<code>output.kafka:
  hosts: ["xxx.xxx.xxx.xxx:9092", "xxx.xxx.xxx.xxx:9092", "xxx.xxx.xxx.xxx:9092"]
  version: 0.9.0.1
  topic: '%{[fields.log_topic]}'
  partition.round_robin: {}
  reachable_only: true
  compression: none
  required_acks: 1
  max_message_bytes: 1000000
  codec.format:
    string: '%{[host.name]}-%{[message]}'
</code>

Key points:

hosts

defines the Kafka broker list.

The

topic

uses the custom field defined in the input configuration.

codec.format

prefixes each log line with the host name.

Start Filebeat with

./filebeat run

in the same directory as

filebeat.yml

and

inputs.d

.

5. How the Log Input Collects Files

Processors are created from the configuration to handle data enrichment.

An

Acker

persists the collection progress reported by

libbeat

.

A

Pipeline.queue.Producer

creates a producer that pushes processed file content into

libbeat

’s internal queue.

File collection details:

The input polls configured paths, checking for new, expired, deleted, or moved files.

For each file, a Harvester reads the file line by line.

File content is packaged and sent to the internal queue via the producer.

File identity is tracked using device ID + inode; progress is persisted for resume.

If a file is truncated, the Harvester restarts from the beginning.

Deletion or renaming behavior depends on

CloseRemoved

and

CloseRenamed

settings.

6. How Logs Are Sent

After the input writes logs to the internal queue,

libbeat

consumes them.

It creates a consumer that batches events into

Batch

objects.

Each batch has an ACK channel for feedback.

Batches are placed into a

workQueue

channel.

For Kafka output, an

outputs.Group

spawns multiple Kafka clients and goroutines.

Each goroutine reads batches from

workQueue

and sends them via the Kafka client.

Successful sends write to the ACK channel, which propagates back to the input.

Retry mechanism (Kafka example):

Failed messages are read from

ch <-chan *sarama.ProducerError

.

Errors like

ErrInvalidMessage

,

ErrMessageSizeTooLarge

, and

ErrInvalidMessageSize

are not retried.

Failed events are re‑queued into the

workQueue

for another send attempt.

Maximum retry count can be set, but the current implementation may retry indefinitely, prioritizing events marked as

Guaranteed

when retries are exhausted.

If the retry queue grows large, it temporarily blocks normal sending to prioritize retries.

7. Closing Remarks

This article does not dive into source‑code details; some implementation nuances were omitted for clarity. Future posts will analyze Filebeat at the code level.

Source: Original article on 360 Cloud Computing

ElasticsearchGoKafkalog collectionFilebeatContainer Logging
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.