Centralized Log Collection with Filebeat and Graylog
This article explains how to use Filebeat together with Graylog to collect, ship, store, and analyze logs from multiple environments, covering tool introductions, configuration files, Docker deployment, Spring Boot integration, and practical search syntax for effective log monitoring.
When a company runs many services across test and production environments, centralized log collection becomes essential. The article compares using Nginx for log exposure versus a dedicated log collection service like ELK, and recommends Graylog as a simpler, extensible alternative that stores logs in Elasticsearch and caches configuration in MongoDB.
Filebeat Overview
Filebeat is a lightweight log shipper that monitors specified log directories or files, reads new entries, and forwards them to Elasticsearch, Logstash, or Graylog. When enabled, Filebeat starts one or more prospectors to detect log files, spawns a harvester for each file, and sends harvested events to a spooler before finally delivering them to the configured Graylog address.
Because Filebeat is lighter than Logstash, it is recommended for environments with limited resources or simpler log collection needs.
Filebeat Configuration
The main configuration file is typically located at /etc/filebeat/filebeat.yml . Below is a sample configuration that enables input files from the inputs.d directory, loads modules, sets up Elasticsearch templates, and defines the Logstash output address.
# Configure input sources
# We have configured all *.yml files under inputs.d
filebeat.config.inputs:
enabled: true
path: ${path.config}/inputs.d/*.yml
# If logs are JSON, enable this
#json.keys_under_root: true
# Load Filebeat modules
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
setup.template.settings:
index.number_of_shards: 1
# Output to Logstash (Graylog)
output.logstash:
hosts: ["11.22.33.44:5500"]
#output.file:
# enable: true
processors:
- add_host_metadata: ~
- rename:
fields:
- from: "log"
to: "message"
- add_fields:
target: ""
fields:
# Add Token to prevent unauthenticated data submission
token: "0uxxxxaM-1111-2222-3333-VQZJxxxxxwgX "An example inputs.d file for collecting logs from specific services is shown below.
# Log type
- type: log
enabled: true
# Paths to log files
paths:
- /var/log/supervisor/app_escape_worker-stderr.log
- /var/log/supervisor/app_escape_prod-stderr.log
symlinks: true
# Include only lines containing these keywords
include_lines: ["WARNING", "ERROR"]
# Tag the data
tags: ["app", "escape", "test"]
# Multiline handling
multiline.pattern: '^\[?[0-9]...{3}'
multiline.negate: true
multiline.match: after
# Additional log types can be added similarly
- type: log
enabled: true
...Filebeat also provides built‑in modules for common services such as iptables, PostgreSQL, and Nginx, each with its own configuration snippet.
# iptables module
- module: iptables
log:
enabled: true
var.paths: ["/var/log/iptables.log"]
var.input: "file"
# PostgreSQL module
- module: postgresql
log:
enabled: true
var.paths: ["/path/to/log/postgres/*.log*"]
# Nginx module
- module: nginx
access:
enabled: true
var.paths: ["/path/to/log/nginx/access.log*"]
error:
enabled: true
var.paths: ["/path/to/log/nginx/error.log*"]Graylog Service Overview
Graylog is an open‑source log aggregation, analysis, and alerting platform. It consists of three core components: Elasticsearch for storing and searching log data, MongoDB for Graylog configuration, and the Graylog server itself for the web UI and processing.
Deployments can range from a single‑node setup to a clustered architecture for high scalability. Images in the original article illustrate both minimal and optimized cluster deployments.
Graylog Core Concepts
Input – the source of log data; each input can have Extractors to transform fields.
Stream – groups logs based on criteria; each stream can write to its own Elasticsearch index set.
Extractor – configured under System → Input to parse and convert fields.
Index Set – defines shard and replica settings, retention policies, and performance parameters.
Pipeline – allows custom processing scripts; an example rule discarding messages with level > 6 is provided.
Sidecar – a lightweight collector daemon (supports NXLog, Filebeat, Winlogbeat) that pulls configuration from Graylog via REST API.
rule "discard debug messages"
when
to_long($message.level) > 6
then
drop_message();
endLogs stored in Graylog can be searched directly, or forwarded to other services via Graylog outputs.
Installation and Deployment
Filebeat can be installed via Debian/Ubuntu packages, Docker, or source compilation. Example commands for Ubuntu:
# Ubuntu (deb)
curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.8.1-amd64.deb
sudo dpkg -i filebeat-7.8.1-amd64.deb
sudo systemctl enable filebeat
sudo service filebeat startDocker deployment example:
docker run -d --name=filebeat --user=root \
--volume="./filebeat.docker.yml:/usr/share/filebeat/filebeat.yml:ro" \
--volume="/var/lib/docker/containers:/var/lib/docker/containers:ro" \
--volume="/var/run/docker.sock:/var/run/docker.sock:ro" \
docker.elastic.co/beats/filebeat:7.8.1 filebeat -e -strict.perms=false \
-E output.elasticsearch.hosts=["elasticsearch:9200"]Graylog can be deployed with Docker‑Compose. After generating a 16‑character password_secret and a SHA‑256 hash of the admin password, the following docker‑compose.yml defines MongoDB, Elasticsearch, and Graylog services, exposing ports for the web UI (9000) and various inputs (5044, 12201, 1514, etc.).
version: "3"
services:
mongo:
restart: on-failure
container_name: graylog_mongo
image: "mongo:3"
volumes:
- "./mongodb:/data/db"
networks:
- graylog_network
elasticsearch:
restart: on-failure
container_name: graylog_es
image: "elasticsearch:6.8.5"
volumes:
- "./es_data:/usr/share/elasticsearch/data"
environment:
- http.host=0.0.0.0
- transport.host=localhost
- network.host=0.0.0.0
- "ES_JAVA_OPTS=-Xms512m -Xmx5120m"
ulimits:
memlock:
soft: -1
hard: -1
deploy:
resources:
limits:
memory: 12g
networks:
- graylog_network
graylog:
restart: on-failure
container_name: graylog_web
image: "graylog/graylog:3.3"
ports:
- 9000:9000 # Web UI
- 5044:5044 # Filebeat input
- 12201:12201 # GELF TCP
- 12201:12201/udp
- 1514:1514 # Syslog TCP
- 1514:1514/udp
volumes:
- "./graylog_journal:/usr/share/graylog/data/journal"
environment:
- GRAYLOG_PASSWORD_SECRET=zscMb65...FxR9ag
- GRAYLOG_ROOT_PASSWORD_SHA2=77e29e0f...557515f
- GRAYLOG_HTTP_EXTERNAL_URI=http://11.22.33.44:9000/
- GRAYLOG_TIMEZONE=Asia/Shanghai
- GRAYLOG_ROOT_TIMEZONE=Asia/Shanghai
networks:
- graylog
depends_on:
- mongo
- elasticsearch
networks:
graylog_network:
driver: bridgeGELF (Graylog Extended Log Format) inputs accept structured events and support compression and chunking. Docker containers can send logs directly to Graylog by specifying the gelf log driver.
# Docker run with GELF driver
docker run --rm=true \
--log-driver=gelf \
--log-opt gelf-address=udp://11.22.33.44:12201 \
--log-opt tag=myapp \
myapp:0.0.1 # Docker‑compose example for Redis service
version: "3"
services:
redis:
restart: always
image: redis
container_name: "redis"
logging:
driver: gelf
options:
gelf-address: udp://11.22.33.44:12201
tag: "redis"
...Graylog Web UI Features
The article includes screenshots of the Graylog UI, demonstrating search, stream management, dashboard creation, and alert configuration.
Spring Boot Integration
To forward Spring Boot logs to Graylog, add the logback‑gelf dependency (version 3.0.0) and create a logback.xml configuration file. The configuration defines a GELF UDP appender pointing to the Graylog host and port, sets chunk size, compression, and includes fields such as application name.
<appender name="GELF" class="de.siegmar.logbackgelf.GelfUdpAppender">
<!-- Graylog address -->
<graylogHost>ip</graylogHost>
<!-- UDP input port -->
<graylogPort>12201</graylogPort>
<!-- Chunk size, compression, etc. -->
<maxChunkSize>508</maxChunkSize>
<useCompression>true</useCompression>
<encoder class="de.siegmar.logbackgelf.GelfEncoder">
<includeRawMessage>false</includeRawMessage>
<includeMarker>true</includeMarker>
<includeMdcData>true</includeMdcData>
<includeCallerData>false</includeCallerData>
<includeRootCauseData>false</includeRootCauseData>
<includeLevelName>true</includeLevelName>
<shortPatternLayout class="ch.qos.logback.classic.PatternLayout">
<pattern>%m%nopex</pattern>
</shortPatternLayout>
<fullPatternLayout class="ch.qos.logback.classic.PatternLayout">
<pattern>%d - [%thread] %-5level %logger{35} - %msg%n</pattern>
</fullPatternLayout>
<staticField>app_name:austin</staticField>
</encoder>
</appender>Replace the placeholder ip with the actual Graylog server address, restart the application, and the logs will appear in Graylog's Search view.
Log Search Syntax
Graylog supports simple fuzzy queries (e.g., orderid ), exact phrase queries (e.g., "orderid: 11" ), field‑specific queries ( message:http ), multi‑field queries, and Boolean combinations such as message:http AND level_name:ERROR OR source:192.168.0.4 .
Final Note
The author encourages readers to like, share, and follow the article, and promotes a paid knowledge community offering advanced projects and tutorials on Spring, micro‑services, big‑data sharding, DDD, and more.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.