Master Fluentd: From Docker Logs to Structured Data Pipelines
This article explains how Fluentd works in distributed and containerized environments, walks through its core concepts of input, filter, and output plugins, and provides a step‑by‑step Docker demo that collects, filters, parses, and stores Kubernetes application logs for further analysis.
If your application runs in a distributed architecture, you’ll likely use a centralized logging system, and Fluentd is a widely adopted tool for collecting logs, including in Kubernetes clusters.
This article explains how Fluentd works and how to adjust its configuration to meet your needs.
Basic Concepts
Typical shell commands like
tail -f myapp.log | grep "what I want" > example.logillustrate what Fluentd excels at: tailing logs or receiving data, filtering and transforming it, then sending it to a backend store.
Input
<code>tail -f myapp.log</code>Fluentd uses input plugins to continuously read files;
tailis one such plugin, with many others available.
Filter
<code>| grep "what I want"</code>The
-foutput is filtered to keep only lines containing the desired string, using a filter plugin in Fluentd.
Output
<code>> example.log</code>The filtered output is saved to
example.logvia an output plugin . Fluentd offers many output plugins beyond file writing.
The basic Fluentd workflow reads logs, processes them, and forwards them for further analysis.
Demo
This demo uses Fluentd to read Docker application logs.
Setup
The demo configuration is hosted at
https://github.com/r1ckr/fluentd-simplified. After cloning, the directory structure looks like:
<code>fluentd/
├── etc/
│ └── fluentd.conf
├── log/
│ └── kong.log
└── output/</code>The
output/directory is where Fluentd writes log files;
log/kong.logcontains Docker‑format JSON logs from a local Kong container.
<code>{
"log": "2019/07/31 22:19:52 [notice] 1#0: start worker process 32\n",
"stream": "stderr",
"time": "2019-07-31T22:19:52.3754634Z"
}</code>Each line is a JSON object, the default Docker log format. The goal is to extract only access logs.
Run Fluentd
<code>$ chmod 777 output/
$ docker run -ti --rm \
-v $(pwd)/etc:/fluentd/etc \
-v $(pwd)/log:/var/log \
-v $(pwd)/output:/output \
fluent/fluentd:v1.11-debian-1 -c /fluentd/etc/fluentd-simplified-finished.conf -v</code>Mount
etc/to
/fluentd/etc/to override the default configuration.
Mount
log/to
/var/log/, exposing
kong.loginside the container.
Mount
output/to
/outputto view Fluentd’s written files.
After starting the container you’ll see a log line indicating the worker is running.
Fluentd Configuration
Input
<code><source>
@type tail
path "/var/log/*.log"
tag "ninja.*"
read_from_head true
<parse>
@type "json"
time_format "%Y-%m-%dT%H:%M:%S.%NZ"
time_type string
</parse>
</source></code>@type tail : input type similar to
tail -f.
path "/var/log/*.log" : watches all
.logfiles, generating tags like
var.log.kong.log.
tag "ninja.*" : prefixes each tag with
ninja..
read_from_head true : reads the whole file, not just new lines.
parse : parses each line as JSON because Docker logs are JSON objects.
Output
<code># Output
<match **>
@type file
path /output/example.log
<buffer>
timekey 1d
timekey_use_utc true
timekey_wait 1m
</buffer>
</match></code>**<match> : matches all tags (only one in this demo).
path /output/example : directory where buffered logs are stored.
The configuration creates a simple input/output pipeline.
Now we examine a sample log file generated by Fluentd:
<code>2020-05-10T17:04:30+00:00 ninja.var.log.kong.log {"log":"172.17.0.1 - - [10/May/2020:17:04:30 +0000] \"GET / HTTP/1.1\" 404 48 \"-\" \"curl/7.59.0\"\n","stream":"stdout"}</code>Each line follows the format
<time> <tag> <content>.
Note: Tags are the string "ninja" followed by the directory path and filename, separated by dots.
Filter
To keep only access logs, add a grep filter:
<code><filter ninja.var.log.kong**>
@type grep
<regexp>
key log
pattern /HTTP/
</regexp>
</filter></code>This filter selects logs whose
logfield contains "HTTP".
Parse Access Logs
After filtering, parse the Nginx access logs:
<code><filter ninja.var.log.kong**>
@type parser
key_name log
<parse>
@type nginx
</parse>
</filter></code>The parser extracts fields such as
remote,
host,
user,
method,
path,
code,
size,
referer,
agent, and
http_x_forwarded_for, enabling powerful downstream queries.
Summary
We have shown how to run Fluentd in Docker, configure basic input, filter, and output plugins, and use them to collect, filter, and parse logs from a Kubernetes‑style Docker application, preparing the data for further analysis or storage.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.