Operations 14 min read

Master Fluentd: From Docker Logs to Structured Data Pipelines

This article explains how Fluentd works in distributed and containerized environments, walks through its core concepts of input, filter, and output plugins, and provides a step‑by‑step Docker demo that collects, filters, parses, and stores Kubernetes application logs for further analysis.

Efficient Ops
Efficient Ops
Efficient Ops
Master Fluentd: From Docker Logs to Structured Data Pipelines

If your application runs in a distributed architecture, you’ll likely use a centralized logging system, and Fluentd is a widely adopted tool for collecting logs, including in Kubernetes clusters.

This article explains how Fluentd works and how to adjust its configuration to meet your needs.

Basic Concepts

Typical shell commands like

tail -f myapp.log | grep "what I want" > example.log

illustrate what Fluentd excels at: tailing logs or receiving data, filtering and transforming it, then sending it to a backend store.

Input

<code>tail -f myapp.log</code>

Fluentd uses input plugins to continuously read files;

tail

is one such plugin, with many others available.

Filter

<code>| grep "what I want"</code>

The

-f

output is filtered to keep only lines containing the desired string, using a filter plugin in Fluentd.

Output

<code>> example.log</code>

The filtered output is saved to

example.log

via an output plugin . Fluentd offers many output plugins beyond file writing.

The basic Fluentd workflow reads logs, processes them, and forwards them for further analysis.

Demo

This demo uses Fluentd to read Docker application logs.

Setup

The demo configuration is hosted at

https://github.com/r1ckr/fluentd-simplified

. After cloning, the directory structure looks like:

<code>fluentd/
    ├── etc/
    │   └── fluentd.conf
    ├── log/
    │   └── kong.log
    └── output/</code>

The

output/

directory is where Fluentd writes log files;

log/kong.log

contains Docker‑format JSON logs from a local Kong container.

<code>{
  "log": "2019/07/31 22:19:52 [notice] 1#0: start worker process 32\n",
  "stream": "stderr",
  "time": "2019-07-31T22:19:52.3754634Z"
}</code>

Each line is a JSON object, the default Docker log format. The goal is to extract only access logs.

Run Fluentd

<code>$ chmod 777 output/
$ docker run -ti --rm \
  -v $(pwd)/etc:/fluentd/etc \
  -v $(pwd)/log:/var/log \
  -v $(pwd)/output:/output \
  fluent/fluentd:v1.11-debian-1 -c /fluentd/etc/fluentd-simplified-finished.conf -v</code>

Mount

etc/

to

/fluentd/etc/

to override the default configuration.

Mount

log/

to

/var/log/

, exposing

kong.log

inside the container.

Mount

output/

to

/output

to view Fluentd’s written files.

After starting the container you’ll see a log line indicating the worker is running.

Fluentd Configuration

Input

<code>&lt;source&gt;
  @type tail
  path "/var/log/*.log"
  tag "ninja.*"
  read_from_head true
  &lt;parse&gt;
    @type "json"
    time_format "%Y-%m-%dT%H:%M:%S.%NZ"
    time_type string
  &lt;/parse&gt;
&lt;/source&gt;</code>

@type tail : input type similar to

tail -f

.

path "/var/log/*.log" : watches all

.log

files, generating tags like

var.log.kong.log

.

tag "ninja.*" : prefixes each tag with

ninja.

.

read_from_head true : reads the whole file, not just new lines.

parse : parses each line as JSON because Docker logs are JSON objects.

Output

<code># Output
&lt;match **&gt;
  @type file
  path /output/example.log
  &lt;buffer&gt;
    timekey 1d
    timekey_use_utc true
    timekey_wait 1m
  &lt;/buffer&gt;
&lt;/match&gt;</code>

**<match> : matches all tags (only one in this demo).

path /output/example : directory where buffered logs are stored.

The configuration creates a simple input/output pipeline.

Now we examine a sample log file generated by Fluentd:

<code>2020-05-10T17:04:30+00:00 ninja.var.log.kong.log {"log":"172.17.0.1 - - [10/May/2020:17:04:30 +0000] \"GET / HTTP/1.1\" 404 48 \"-\" \"curl/7.59.0\"\n","stream":"stdout"}</code>

Each line follows the format

&lt;time&gt; &lt;tag&gt; &lt;content&gt;

.

Note: Tags are the string "ninja" followed by the directory path and filename, separated by dots.

Filter

To keep only access logs, add a grep filter:

<code>&lt;filter ninja.var.log.kong**&gt;
  @type grep
  &lt;regexp&gt;
    key log
    pattern /HTTP/
  &lt;/regexp&gt;
&lt;/filter&gt;</code>

This filter selects logs whose

log

field contains "HTTP".

Parse Access Logs

After filtering, parse the Nginx access logs:

<code>&lt;filter ninja.var.log.kong**&gt;
  @type parser
  key_name log
  &lt;parse&gt;
    @type nginx
  &lt;/parse&gt;
&lt;/filter&gt;</code>

The parser extracts fields such as

remote

,

host

,

user

,

method

,

path

,

code

,

size

,

referer

,

agent

, and

http_x_forwarded_for

, enabling powerful downstream queries.

Summary

We have shown how to run Fluentd in Docker, configure basic input, filter, and output plugins, and use them to collect, filter, and parse logs from a Kubernetes‑style Docker application, preparing the data for further analysis or storage.

DockerkubernetesconfigurationLogginglog aggregationFluentd
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.