Operations 14 min read

Comprehensive Guide to Deploying Filebeat and Graylog for Centralized Log Collection

This article explains how to use Filebeat and Graylog together for centralized log collection, covering Filebeat’s role, configuration files, input modules, Graylog’s architecture, pipeline rules, and step‑by‑step deployment using Docker and docker‑compose, providing practical commands and examples for operational environments.

Architecture Digest
Architecture Digest
Architecture Digest
Comprehensive Guide to Deploying Filebeat and Graylog for Centralized Log Collection

When an organization runs many services across test and production environments, centralized log collection becomes essential; the article compares using Nginx versus a dedicated ELK stack and introduces Graylog as a simpler, extensible alternative that leverages Elasticsearch for storage and MongoDB for configuration.

Filebeat is presented as a lightweight log shipper that monitors specified directories or files, spawns prospectors to detect log files, harvesters to read new entries, and a spooler to batch events before sending them to a destination such as Graylog.

Key parts of the Filebeat configuration are shown, including the main filebeat.yml file that defines input paths, module loading, index settings, and output to Logstash or Graylog.

# Configure input sources
filebeat.config.inputs:
  enabled: true
  path: ${path.config}/inputs.d/*.yml

# Load modules
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false

# Output to Graylog (GELF)
output.logstash:
  hosts: ["11.22.33.44:5500"]

processors:
  - add_host_metadata: ~
  - rename:
      fields:
        - from: "log"
          to: "message"
  - add_fields:
      target: ""
      fields:
        token: "0uxxxxaM-1111-2222-3333-VQZJxxxxxwgX"

An example of an inputs.d YAML file demonstrates how to collect logs from specific paths, filter by keywords, tag data, and handle multiline stack traces.

# Collect log type
- type: log
  enabled: true
  paths:
    - /var/log/supervisor/app_escape_worker-stderr.log
    - /var/log/supervisor/app_escape_prod-stderr.log
  symlinks: true
  include_lines: ["WARNING", "ERROR"]
  tags: ["app", "escape", "test"]
  multiline.pattern: '^\[?[0-9]...{3}'
  multiline.negate: true
  multiline.match: after

Graylog’s architecture consists of three core components—Elasticsearch for persisting and searching log data, MongoDB for storing Graylog configuration, and the Graylog server itself providing a web UI and APIs. Both single‑node and clustered deployments are illustrated.

Graylog processes logs through Inputs, Extractors, Streams, and optional Pipelines. A sample pipeline rule that discards messages with a level greater than 6 is provided.

rule "discard debug messages"
when
  to_long($message.level) > 6
then
  drop_message();
end

Deployment instructions for Filebeat cover installation via DEB/RPM packages, Docker container execution, and the necessary command‑line options to connect to Graylog.

# Ubuntu (deb)
$ curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.8.1-amd64.deb
$ sudo dpkg -i filebeat-7.8.1-amd64.deb
$ sudo systemctl enable filebeat
$ sudo service filebeat start

# Docker run
docker run -d --name=filebeat --user=root \
  --volume="./filebeat.docker.yml:/usr/share/filebeat/filebeat.yml:ro" \
  --volume="/var/lib/docker/containers:/var/lib/docker/containers:ro" \
  --volume="/var/run/docker.sock:/var/run/docker.sock:ro" \
  docker.elastic.co/beats/filebeat:7.8.1 -e -strict.perms=false \
  -E output.elasticsearch.hosts=["elasticsearch:9200"]

Graylog is deployed with Docker‑Compose. The article shows how to generate a 16‑character password secret and a SHA‑256 root password, then provides a complete docker‑compose.yml that defines MongoDB, Elasticsearch, and Graylog services with appropriate ports and environment variables.

version: "3"
services:
  mongo:
    restart: on-failure
    container_name: graylog_mongo
    image: "mongo:3"
    volumes:
      - "./mongodb:/data/db"
    networks:
      - graylog_network

  elasticsearch:
    restart: on-failure
    container_name: graylog_es
    image: "elasticsearch:6.8.5"
    volumes:
      - "./es_data:/usr/share/elasticsearch/data"
    environment:
      - http.host=0.0.0.0
      - transport.host=localhost
      - network.host=0.0.0.0
      - "ES_JAVA_OPTS=-Xms512m -Xmx5120m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    deploy:
      resources:
        limits:
          memory: 12g
    networks:
      - graylog_network

  graylog:
    restart: on-failure
    container_name: graylog_web
    image: "graylog/graylog:3.3"
    ports:
      - 9000:9000   # Web UI
      - 5044:5044   # Filebeat input
      - 12201:12201   # GELF TCP
      - 12201:12201/udp   # GELF UDP
      - 1514:1514   # Syslog TCP
      - 1514:1514/udp   # Syslog UDP
    volumes:
      - "./graylog_journal:/usr/share/graylog/data/journal"
    environment:
      - GRAYLOG_PASSWORD_SECRET=zscMb65...FxR9ag
      - GRAYLOG_ROOT_PASSWORD_SHA2=77e29e0f...557515f
      - GRAYLOG_HTTP_EXTERNAL_URI=http://11.22.33.44:9000/
      - GRAYLOG_TIMEZONE=Asia/Shanghai
      - GRAYLOG_ROOT_TIMEZONE=Asia/Shanghai
    networks:
      - graylog
    depends_on:
      - mongo
      - elasticsearch

networks:
  graylog_network:
    driver: bridge

The Sidecar component is described as a lightweight log collector that can run on Linux or Windows, fetches its configuration from Graylog via REST API, and supports Beats, CEF, GELF, JSON, and NetFlow outputs. Using the GELF driver, Docker containers can forward logs directly to Graylog.

# Docker run with GELF driver
docker run --rm=true \
  --log-driver=gelf \
  --log-opt gelf-address=udp://11.22.33.44:12201 \
  --log-opt tag=myapp \
  myapp:0.0.1

Finally, the article briefly showcases the Graylog web UI, highlighting its search, stream, and dashboard capabilities, and provides links to additional resources and community groups.

MonitoringDockerOperationsElasticsearchLog ManagementFilebeatGraylog
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.