How to Build a Real‑Time Nginx Log Analytics Pipeline with ELK, Kafka, and Filebeat
This guide walks through setting up an end‑to‑end log collection and analysis solution for Nginx using ELK (Elasticsearch, Logstash, Kibana), Filebeat, and Kafka, covering service introduction, architecture design, Linux system preparation, configuration of each component, and visualisation in Kibana.
Service Overview
ELK is a suite of three open‑source tools—Elasticsearch, Logstash, and Kibana—used for storing, processing, and visualising log data. Filebeat is a lightweight log‑shipper that forwards logs to Logstash or directly to Elasticsearch. Nginx is a high‑performance HTTP server and reverse proxy. Kafka is a distributed streaming platform for reliable log transport.
Architecture
Filebeat collects Nginx access logs and pushes them to a Kafka topic. Logstash consumes the topic, parses the logs with a grok pattern, enriches them with geoip and useragent filters, and writes the structured events to Elasticsearch. Kibana reads from Elasticsearch to provide dashboards and visualisations.
System Initialization
# View hardware information
dmidecode | grep "Product Name"
# View CPU model and count
grep name /proc/cpuinfo
grep "physical id" /proc/cpuinfo
# View memory size
grep MemTotal /proc/meminfoFor production servers, IP addresses have been masked.
Basic Linux Setup
# Stop firewalld
systemctl stop firewalld
# Disable SELinux
setenforce 0
sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config
# Add a regular user
useradd elsearch
echo "******" | passwd --stdin elsearch
# Configure yum repositories (CentOS‑7 base, updates, extras)
cat /etc/yum.repos.d/CentOS-Base.repo
# Increase file descriptor limit
echo "* - nofile 65535" >> /etc/security/limits.conf
# Kernel parameter tuning (example)
cp /etc/sysctl.conf /etc/sysctl.conf.bak
cat >> /etc/sysctl.conf <<EOF
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2
net.ipv4.tcp_mem = 94500000 915000000 927000000
net.ipv4.tcp_max_orphans = 3276800
net.core.wmem_default = 8388608
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.core.netdev_max_backlog = 32768
net.core.somaxconn = 32768
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 1
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.tcp_max_syn_backlog = 65536
net.ipv4.ip_local_port_range = 1024 65535
EOF
/sbin/sysctl -pDeploy Nginx Log Collection
# Define log format in /etc/nginx/nginx.conf
log_format main '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for"';
# Apply the format to the virtual host
cat /etc/nginx/conf.d/vhost/api.mingongge.com.cn.conf
server {
listen 80;
server_name newtest-msp-api.mingongge.com.cn;
access_log /var/log/nginx/api.mingongge.com.cn.log main;
}
# Reload Nginx
nginx -s reload
# Truncate old log file
:> /var/log/nginx/api.mingongge.com.cn.logKafka Topic Creation
# Create topic "nginxlog"
./kafka-topics.sh --create --topic nginxlog --replication-factor 1 --partitions 1 --zookeeper localhost:2181
# Enable auto‑create (optional)
auto.create.topics.enable=true
# Verify data flow after Filebeat starts
./kafka-console-consumer.sh --bootstrap-server 192.168.0.53:9091 --from-beginning --topic nginxlogFilebeat Installation & Configuration
# Install Filebeat 6.3.2
cd /opt && wget http://download.mingongge.cn/download/software/filebeat-6.3.2-x86_64.rpm
yum localinstall filebeat-6.3.2-x86_64.rpm -y
# Edit /etc/filebeat/filebeat.yml
filebeat.prospectors:
- input_type: log
enabled: true
paths:
- /var/log/nginx/api.mingongge.com.cn.log
fields:
log_topic: nginxlog
json.keys_under_root: true
json.overwrite_keys: true
output.kafka:
enabled: true
hosts: ["192.168.0.53:9091"]
topic: "%{[fields][log_topics]}"
partition.round_robin:
reachable_only: false
compression: gzip
max_message_bytes: 1000000
required_acks: 1
# Start and enable at boot
systemctl start filebeat
systemctl enable filebeatLogstash Configuration (nginx.conf)
input {
kafka {
type => "nginxlog"
topics => ["nginxlog"]
bootstrap_servers => ["192.168.0.53:9091"]
group_id => "nginxlog"
auto_offset_reset => "latest"
codec => "json"
}
}
filter {
if [type] == "nginxlog" {
grok { match => { "message" => "%{COMBINEDAPACHELOG}" } remove_field => ["message"] }
date { match => ["timestamp", "dd/MMM/YYYY:HH:mm:ss Z"] }
geoip { source => "clientip" target => "geoip" database => "/usr/local/logstash/config/GeoLite2-City.mmdb" }
mutate { convert => { "[geoip][coordinates]" => "float" } }
useragent { source => "agent" target => "userAgent" }
}
}
output {
if [type] == "nginxlog" {
elasticsearch { hosts => ["http://192.168.0.48:9200"] index => "logstash-nginxlog-%{+YYYY.MM.dd}" }
stdout { codec => rubydebug }
}
}Kibana Setup
# Add Gaode map tile source in kibana.yml
tilemap.url: 'http://webrd02.is.autonavi.com/appmaptile?lang=zh_cn&size=1&scale=1&style=7&x={x}&y={y}&z={z}'
# Create index pattern "logstash-nginxlog-*" and enable time filter on @timestamp
# Build dashboards: IP Top‑5, PV, Global map, Real‑time traffic, OS distribution, Login count, Region distribution, etc.The complete pipeline enables real‑time collection, parsing, enrichment, storage, and visualisation of Nginx access logs, providing operational insight such as top IPs, page views, geographic distribution, and traffic trends.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
