Big Data 12 min read

Build a Billion-Scale ELK Logging Platform with Filebeat, Kafka, Elasticsearch

This guide walks through the complete architecture and step‑by‑step deployment of a billion‑scale ELK logging system, covering Filebeat agents, Kafka buffering, Logstash processing, Elasticsearch indexing, and Kibana visualization, including configuration files, version details, and best‑practice tips for scaling and security.

MaGe Linux Operations

May 14, 2021

Build a Billion-Scale ELK Logging Platform with Filebeat, Kafka, Elasticsearch

Overall Architecture

The system consists of four main modules:

Filebeat : a lightweight data shipper, the successor of Logstash‑forwarder, and the preferred ELK agent.

Kafka : a message queue that buffers data, decouples processing, and provides high‑throughput peak handling.

Logstash : a data collection and processing engine that ingests, filters, enriches, and formats logs before forwarding.

Elasticsearch : a distributed search engine used for full‑text, structured, and analytical queries.

Practical Example

We use a typical Nginx access log in JSON format as the source data.

{"@timestamp":"2017-12-27T16:38:17+08:00","host":"192.168.56.11","clientip":"192.168.56.11","size":26,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"192.168.56.11","url":"/nginxweb/index.html","domain":"192.168.56.11","xff":"-","referer":"-","status":"200"}

Filebeat

Why use Filebeat instead of the original Logstash?

Filebeat consumes far fewer resources because it is written in Go and runs as a native binary rather than on the JVM.

Installation steps:

$ wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.2.4-darwin-x86_64.tar.gz

$ tar -zxvf filebeat-6.2.4-darwin-x86_64.tar.gz

$ mv filebeat-6.2.4-darwin-x86_64 filebeat

Edit filebeat.yml to collect Nginx logs and ship them to Kafka:

filebeat.prospectors:
- input_type: log
  paths:
    - /opt/logs/server/nginx.log
  json.keys_under_root: true
  json.add_error_key: true
  json.message_key: log
output.kafka:
  hosts: ["192.168.0.1:9092","192.168.0.2:9092","192.168.0.3:9092"]
  topic: "nginx"

$ ./filebeat -e -c filebeat.yml

Kafka

Deploy a 3‑node Kafka cluster (2N+1 recommendation).

$ wget http://mirror.bit.edu.cn/apache/kafka/1.0.0/kafka_2.11-1.0.0.tgz

$ tar -zxvf kafka_2.11-1.0.0.tgz

$ mv kafka_2.11-1.0.0 kafka

Configure Zookeeper (embedded in Kafka) with three nodes:

tickTime=2000
dataDir=/opt/zookeeper
clientPort=2181
maxClientCnxns=50
initLimit=10
syncLimit=5
server.1=192.168.0.1:2888:3888
server.2=192.168.0.2:2888:3888
server.3=192.168.0.3:2888:3888

$ ./zookeeper-server-start.sh -daemon ./config/zookeeper.properties

Kafka broker configuration (example for node 1):

broker.id=1
port=9092
host.name=192.168.0.1
log.dirs=/opt/kafka_logs
num.partitions=3
zookeeper.connect=192.168.0.1:2181,192.168.0.2:2181,192.168.0.3:2181

$ ./bin/kafka-server-start.sh -daemon ./config/server.properties

Logstash

Download and extract Logstash 6.2.4:

$ wget https://artifacts.elastic.co/downloads/logstash/logstash-6.2.4.tar.gz

$ tar -zxvf logstash-6.2.4.tar.gz

$ mv logstash-6.2.4 logstash

Create nginx.conf to consume from Kafka and output to Elasticsearch:

input {
  kafka {
    type => "kafka"
    bootstrap_servers => "192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092"
    topics => "nginx"
    group_id => "logstash"
    consumer_threads => 2
  }
}
output {
  elasticsearch {
    hosts => ["192.168.0.1","192.168.0.2","192.168.0.3"]
    port => "9300"
    index => "nginx-%{+YYYY.MM.dd}"
  }
}

$ ./bin/logstash -f nginx.conf

Elasticsearch

Download, extract and configure Elasticsearch 6.2.4:

$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.2.4.tar.gz

$ tar -zxvf elasticsearch-6.2.4.tar.gz

$ mv elasticsearch-6.2.4 elasticsearch

Configuration ( config/elasticsearch.yml) example:

cluster.name: es
node.name: es-node1
network.host: 192.168.0.1
discovery.zen.ping.unicast.hosts: ["192.168.0.1"]
discovery.zen.minimum_master_nodes: 1

$ ./bin/elasticsearch -d

Kibana

Download, extract and configure Kibana 6.2.4:

$ wget https://artifacts.elastic.co/downloads/kibana/kibana-6.2.4-darwin-x86_64.tar.gz

$ tar -zxvf kibana-6.2.4-darwin-x86_64.tar.gz

$ mv kibana-6.2.4-darwin-x86_64 kibana

Modify config/kibana.yml:

server.port: 5601
server.host: "192.168.0.1"
elasticsearch.url: "http://192.168.0.1:9200"

$ nohup ./bin/kibana &

In Kibana, create an index pattern (e.g., nginx-*) under Management → Index Patterns to visualize the logs.

Conclusion

By following the commands above, you can deploy a complete ELK stack that handles log collection, filtering, indexing, and visualization. Horizontal scaling of Kafka and Elasticsearch enables daily processing of billions of log entries in real time, while the provided configuration tips ensure reliability and security.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch kafka Logging ELK Logstash Kibana filebeat Scalable

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.