Master Fluentd: From Docker Logs to Structured Data Pipelines
This article explains how Fluentd works in distributed and containerized environments, walks through its core concepts of input, filter, and output plugins, and provides a step‑by‑step Docker demo that collects, filters, parses, and stores Kubernetes application logs for further analysis.
If your application runs in a distributed architecture, you’ll likely use a centralized logging system, and Fluentd is a widely adopted tool for collecting logs, including in Kubernetes clusters.
This article explains how Fluentd works and how to adjust its configuration to meet your needs.
Basic Concepts
Typical shell commands like tail -f myapp.log | grep "what I want" > example.log illustrate what Fluentd excels at: tailing logs or receiving data, filtering and transforming it, then sending it to a backend store.
Input
tail -f myapp.logFluentd uses input plugins to continuously read files; tail is one such plugin, with many others available.
Filter
| grep "what I want"The -f output is filtered to keep only lines containing the desired string, using a filter plugin in Fluentd.
Output
> example.logThe filtered output is saved to example.log via an output plugin . Fluentd offers many output plugins beyond file writing.
The basic Fluentd workflow reads logs, processes them, and forwards them for further analysis.
Demo
This demo uses Fluentd to read Docker application logs.
Setup
The demo configuration is hosted at https://github.com/r1ckr/fluentd-simplified. After cloning, the directory structure looks like:
fluentd/
├── etc/
│ └── fluentd.conf
├── log/
│ └── kong.log
└── output/The output/ directory is where Fluentd writes log files; log/kong.log contains Docker‑format JSON logs from a local Kong container.
{
"log": "2019/07/31 22:19:52 [notice] 1#0: start worker process 32
",
"stream": "stderr",
"time": "2019-07-31T22:19:52.3754634Z"
}Each line is a JSON object, the default Docker log format. The goal is to extract only access logs.
Run Fluentd
$ chmod 777 output/
$ docker run -ti --rm \
-v $(pwd)/etc:/fluentd/etc \
-v $(pwd)/log:/var/log \
-v $(pwd)/output:/output \
fluent/fluentd:v1.11-debian-1 -c /fluentd/etc/fluentd-simplified-finished.conf -vMount etc/ to /fluentd/etc/ to override the default configuration.
Mount log/ to /var/log/, exposing kong.log inside the container.
Mount output/ to /output to view Fluentd’s written files.
After starting the container you’ll see a log line indicating the worker is running.
Fluentd Configuration
Input
<source>
@type tail
path "/var/log/*.log"
tag "ninja.*"
read_from_head true
<parse>
@type "json"
time_format "%Y-%m-%dT%H:%M:%S.%NZ"
time_type string
</parse>
</source>@type tail : input type similar to tail -f.
path "/var/log/*.log" : watches all .log files, generating tags like var.log.kong.log.
tag "ninja.*" : prefixes each tag with ninja..
read_from_head true : reads the whole file, not just new lines.
parse : parses each line as JSON because Docker logs are JSON objects.
Output
# Output
<match **>
@type file
path /output/example.log
<buffer>
timekey 1d
timekey_use_utc true
timekey_wait 1m
</buffer>
</match>**<match> : matches all tags (only one in this demo).
path /output/example : directory where buffered logs are stored.
The configuration creates a simple input/output pipeline.
Now we examine a sample log file generated by Fluentd:
2020-05-10T17:04:30+00:00 ninja.var.log.kong.log {"log":"172.17.0.1 - - [10/May/2020:17:04:30 +0000] \"GET / HTTP/1.1\" 404 48 \"-\" \"curl/7.59.0\"
","stream":"stdout"}Each line follows the format <time> <tag> <content>.
Note: Tags are the string "ninja" followed by the directory path and filename, separated by dots.
Filter
To keep only access logs, add a grep filter:
<filter ninja.var.log.kong**>
@type grep
<regexp>
key log
pattern /HTTP/
</regexp>
</filter>This filter selects logs whose log field contains "HTTP".
Parse Access Logs
After filtering, parse the Nginx access logs:
<filter ninja.var.log.kong**>
@type parser
key_name log
<parse>
@type nginx
</parse>
</filter>The parser extracts fields such as remote, host, user, method, path, code, size, referer, agent, and http_x_forwarded_for, enabling powerful downstream queries.
Summary
We have shown how to run Fluentd in Docker, configure basic input, filter, and output plugins, and use them to collect, filter, and parse logs from a Kubernetes‑style Docker application, preparing the data for further analysis or storage.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
