Operations 14 min read

Master Fluentd: From Docker Logs to Structured Data Pipelines

This article explains how Fluentd works in distributed and containerized environments, walks through its core concepts of input, filter, and output plugins, and provides a step‑by‑step Docker demo that collects, filters, parses, and stores Kubernetes application logs for further analysis.

Efficient Ops

Aug 7, 2022

Master Fluentd: From Docker Logs to Structured Data Pipelines

If your application runs in a distributed architecture, you’ll likely use a centralized logging system, and Fluentd is a widely adopted tool for collecting logs, including in Kubernetes clusters.

This article explains how Fluentd works and how to adjust its configuration to meet your needs.

Basic Concepts

Typical shell commands like tail -f myapp.log | grep "what I want" > example.log illustrate what Fluentd excels at: tailing logs or receiving data, filtering and transforming it, then sending it to a backend store.

Input

tail -f myapp.log

Fluentd uses input plugins to continuously read files; tail is one such plugin, with many others available.

Filter

| grep "what I want"

The -f output is filtered to keep only lines containing the desired string, using a filter plugin in Fluentd.

Output

> example.log

The filtered output is saved to example.log via an output plugin . Fluentd offers many output plugins beyond file writing.

The basic Fluentd workflow reads logs, processes them, and forwards them for further analysis.

Demo

This demo uses Fluentd to read Docker application logs.

Setup

The demo configuration is hosted at https://github.com/r1ckr/fluentd-simplified. After cloning, the directory structure looks like:

fluentd/
    ├── etc/
    │   └── fluentd.conf
    ├── log/
    │   └── kong.log
    └── output/

The output/ directory is where Fluentd writes log files; log/kong.log contains Docker‑format JSON logs from a local Kong container.

{
  "log": "2019/07/31 22:19:52 [notice] 1#0: start worker process 32
",
  "stream": "stderr",
  "time": "2019-07-31T22:19:52.3754634Z"
}

Each line is a JSON object, the default Docker log format. The goal is to extract only access logs.

Run Fluentd

$ chmod 777 output/
$ docker run -ti --rm \
  -v $(pwd)/etc:/fluentd/etc \
  -v $(pwd)/log:/var/log \
  -v $(pwd)/output:/output \
  fluent/fluentd:v1.11-debian-1 -c /fluentd/etc/fluentd-simplified-finished.conf -v

Mount etc/ to /fluentd/etc/ to override the default configuration.

Mount log/ to /var/log/, exposing kong.log inside the container.

Mount output/ to /output to view Fluentd’s written files.

After starting the container you’ll see a log line indicating the worker is running.

Fluentd Configuration

Input

<source>
  @type tail
  path "/var/log/*.log"
  tag "ninja.*"
  read_from_head true
  <parse>
    @type "json"
    time_format "%Y-%m-%dT%H:%M:%S.%NZ"
    time_type string
  </parse>
</source>

@type tail : input type similar to tail -f.

path "/var/log/*.log" : watches all .log files, generating tags like var.log.kong.log.

tag "ninja.*" : prefixes each tag with ninja..

read_from_head true : reads the whole file, not just new lines.

parse : parses each line as JSON because Docker logs are JSON objects.

Output

# Output
<match **>
  @type file
  path /output/example.log
  <buffer>
    timekey 1d
    timekey_use_utc true
    timekey_wait 1m
  </buffer>
</match>

**<match> : matches all tags (only one in this demo).

path /output/example : directory where buffered logs are stored.

The configuration creates a simple input/output pipeline.

Now we examine a sample log file generated by Fluentd:

2020-05-10T17:04:30+00:00 ninja.var.log.kong.log {"log":"172.17.0.1 - - [10/May/2020:17:04:30 +0000] \"GET / HTTP/1.1\" 404 48 \"-\" \"curl/7.59.0\"
","stream":"stdout"}

Each line follows the format <time> <tag> <content>.

Note: Tags are the string "ninja" followed by the directory path and filename, separated by dots.

Filter

To keep only access logs, add a grep filter:

<filter ninja.var.log.kong**>
  @type grep
  <regexp>
    key log
    pattern /HTTP/
  </regexp>
</filter>

This filter selects logs whose log field contains "HTTP".

Parse Access Logs

After filtering, parse the Nginx access logs:

<filter ninja.var.log.kong**>
  @type parser
  key_name log
  <parse>
    @type nginx
  </parse>
</filter>

The parser extracts fields such as remote, host, user, method, path, code, size, referer, agent, and http_x_forwarded_for, enabling powerful downstream queries.

Summary

We have shown how to run Fluentd in Docker, configure basic input, filter, and output plugins, and use them to collect, filter, and parse logs from a Kubernetes‑style Docker application, preparing the data for further analysis or storage.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Docker Configuration logging Fluentd log-aggregation

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.