Operations 14 min read

How to Build a Real‑Time Nginx Log Analytics Pipeline with ELK, Kafka, and Filebeat

This guide walks through setting up an end‑to‑end log collection and analysis solution for Nginx using ELK (Elasticsearch, Logstash, Kibana), Filebeat, and Kafka, covering service introduction, architecture design, Linux system preparation, configuration of each component, and visualisation in Kibana.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How to Build a Real‑Time Nginx Log Analytics Pipeline with ELK, Kafka, and Filebeat

Service Overview

ELK is a suite of three open‑source tools—Elasticsearch, Logstash, and Kibana—used for storing, processing, and visualising log data. Filebeat is a lightweight log‑shipper that forwards logs to Logstash or directly to Elasticsearch. Nginx is a high‑performance HTTP server and reverse proxy. Kafka is a distributed streaming platform for reliable log transport.

Architecture

Filebeat collects Nginx access logs and pushes them to a Kafka topic. Logstash consumes the topic, parses the logs with a grok pattern, enriches them with geoip and useragent filters, and writes the structured events to Elasticsearch. Kibana reads from Elasticsearch to provide dashboards and visualisations.

System Initialization

# View hardware information
 dmidecode | grep "Product Name"
# View CPU model and count
 grep name /proc/cpuinfo
 grep "physical id" /proc/cpuinfo
# View memory size
 grep MemTotal /proc/meminfo

For production servers, IP addresses have been masked.

Basic Linux Setup

# Stop firewalld
 systemctl stop firewalld
# Disable SELinux
 setenforce 0
 sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config
# Add a regular user
 useradd elsearch
 echo "******" | passwd --stdin elsearch
# Configure yum repositories (CentOS‑7 base, updates, extras)
 cat /etc/yum.repos.d/CentOS-Base.repo
# Increase file descriptor limit
 echo "*        -    nofile 65535" >> /etc/security/limits.conf
# Kernel parameter tuning (example)
 cp /etc/sysctl.conf /etc/sysctl.conf.bak
 cat >> /etc/sysctl.conf <<EOF
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2
net.ipv4.tcp_mem = 94500000 915000000 927000000
net.ipv4.tcp_max_orphans = 3276800
net.core.wmem_default = 8388608
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.core.netdev_max_backlog = 32768
net.core.somaxconn = 32768
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 1
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.tcp_max_syn_backlog = 65536
net.ipv4.ip_local_port_range = 1024 65535
EOF
 /sbin/sysctl -p

Deploy Nginx Log Collection

# Define log format in /etc/nginx/nginx.conf
log_format main '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for"';
# Apply the format to the virtual host
cat /etc/nginx/conf.d/vhost/api.mingongge.com.cn.conf
server {
    listen 80;
    server_name newtest-msp-api.mingongge.com.cn;
    access_log /var/log/nginx/api.mingongge.com.cn.log main;
}
# Reload Nginx
nginx -s reload
# Truncate old log file
:> /var/log/nginx/api.mingongge.com.cn.log

Kafka Topic Creation

# Create topic "nginxlog"
./kafka-topics.sh --create --topic nginxlog --replication-factor 1 --partitions 1 --zookeeper localhost:2181
# Enable auto‑create (optional)
auto.create.topics.enable=true
# Verify data flow after Filebeat starts
./kafka-console-consumer.sh --bootstrap-server 192.168.0.53:9091 --from-beginning --topic nginxlog

Filebeat Installation & Configuration

# Install Filebeat 6.3.2
cd /opt && wget http://download.mingongge.cn/download/software/filebeat-6.3.2-x86_64.rpm
yum localinstall filebeat-6.3.2-x86_64.rpm -y
# Edit /etc/filebeat/filebeat.yml
filebeat.prospectors:
- input_type: log
  enabled: true
  paths:
    - /var/log/nginx/api.mingongge.com.cn.log
  fields:
    log_topic: nginxlog
  json.keys_under_root: true
  json.overwrite_keys: true
output.kafka:
  enabled: true
  hosts: ["192.168.0.53:9091"]
  topic: "%{[fields][log_topics]}"
  partition.round_robin:
    reachable_only: false
  compression: gzip
  max_message_bytes: 1000000
  required_acks: 1
# Start and enable at boot
systemctl start filebeat
systemctl enable filebeat

Logstash Configuration (nginx.conf)

input {
  kafka {
    type => "nginxlog"
    topics => ["nginxlog"]
    bootstrap_servers => ["192.168.0.53:9091"]
    group_id => "nginxlog"
    auto_offset_reset => "latest"
    codec => "json"
  }
}
filter {
  if [type] == "nginxlog" {
    grok { match => { "message" => "%{COMBINEDAPACHELOG}" } remove_field => ["message"] }
    date { match => ["timestamp", "dd/MMM/YYYY:HH:mm:ss Z"] }
    geoip { source => "clientip" target => "geoip" database => "/usr/local/logstash/config/GeoLite2-City.mmdb" }
    mutate { convert => { "[geoip][coordinates]" => "float" } }
    useragent { source => "agent" target => "userAgent" }
  }
}
output {
  if [type] == "nginxlog" {
    elasticsearch { hosts => ["http://192.168.0.48:9200"] index => "logstash-nginxlog-%{+YYYY.MM.dd}" }
    stdout { codec => rubydebug }
  }
}

Kibana Setup

# Add Gaode map tile source in kibana.yml
tilemap.url: 'http://webrd02.is.autonavi.com/appmaptile?lang=zh_cn&size=1&scale=1&style=7&x={x}&y={y}&z={z}'
# Create index pattern "logstash-nginxlog-*" and enable time filter on @timestamp
# Build dashboards: IP Top‑5, PV, Global map, Real‑time traffic, OS distribution, Login count, Region distribution, etc.

The complete pipeline enables real‑time collection, parsing, enrichment, storage, and visualisation of Nginx access logs, providing operational insight such as top IPs, page views, geographic distribution, and traffic trends.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KafkaELKLogstashKibanaFilebeat
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.