Operations 12 min read

Build a Billion-Scale ELK Logging Platform with Filebeat, Kafka, Elasticsearch

This guide walks through the complete design and step‑by‑step deployment of a billion‑scale ELK logging platform, covering architecture, component roles, version selection, configuration files, and command‑line installation for Filebeat, Kafka, Logstash, Elasticsearch, and Kibana.

Efficient Ops

Jul 26, 2020

Build a Billion-Scale ELK Logging Platform with Filebeat, Kafka, Elasticsearch

Overall Architecture

The logging platform consists of four main modules: Filebeat for lightweight log collection, Kafka as a buffering queue, Logstash for data processing, and Elasticsearch as a distributed search engine. Kibana provides visualization.

Filebeat: 6.2.4
Kafka: 2.11-1
Logstash: 6.2.4
Elasticsearch: 6.2.4
Kibana: 6.2.4

Filebeat

Filebeat replaces the older Logstash‑forwarder with a Go‑based lightweight agent. It is deployed on each application server, often via Salt.

Download and extract:

$ wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.2.4-darwin-x86_64.tar.gz
$ tar -zxvf filebeat-6.2.4-darwin-x86_64.tar.gz
$ mv filebeat-6.2.4-darwin-x86_64 filebeat
$ cd filebeat

Configuration (collect Nginx JSON logs and send to Kafka):

$ vim filebeat.yml
filebeat.prospectors:
- input_type: log
  paths:
    - /opt/logs/server/nginx.log
  json.keys_under_root: true
  json.add_error_key: true
  json.message_key: log
output.kafka:
  hosts: ["192.168.0.1:9092","192.168.0.2:9092","192.168.0.3:9092"]
  topic: "nginx"

Start Filebeat:

$ ./filebeat -e -c filebeat.yml

Kafka

Deploy a 3‑node Kafka cluster (2N+1 recommendation). Install and configure Zookeeper first.

Download and extract:

$ wget http://mirror.bit.edu.cn/apache/kafka/1.0.0/kafka_2.11-1.0.0.tgz
$ tar -zxvf kafka_2.11-1.0.0.tgz
$ mv kafka_2.11-1.0.0 kafka
$ cd kafka

Zookeeper configuration (3 nodes):

$ vim zookeeper.properties
tickTime=2000
dataDir=/opt/zookeeper
clientPort=2181
maxClientCnxns=50
initLimit=10
syncLimit=5
server.1=192.168.0.1:2888:3888
server.2=192.168.0.2:2888:3888
server.3=192.168.0.3:2888:3888

Create myid file with node id (1,2,3) and start each Zookeeper instance:

$ ./zookeeper-server-start.sh -daemon ./config/zookeeper.properties

Kafka broker configuration (example for broker 1):

$ vim ./config/server.properties
broker.id=1
port=9092
host.name=192.168.0.1
log.dirs=/opt/kafka_logs
num.partitions=3
zookeeper.connect=192.168.0.1:2181,192.168.0.2:2181,192.168.0.3:2181

Start each Kafka broker:

$ ./bin/kafka-server-start.sh -daemon ./config/server.properties

Verify topic creation:

$ bin/kafka-topics.sh --list --zookeeper localhost:2181
nginx

Logstash

Logstash provides INPUT, FILTER, and OUTPUT stages. Use a Grok debugger for filter patterns.

Download and extract:

$ wget https://artifacts.elastic.co/downloads/logstash/logstash-6.2.4.tar.gz
$ tar -zxvf logstash-6.2.4.tar.gz
$ mv logstash-6.2.4 logstash

Configuration (read from Kafka and write to Elasticsearch):

$ vim nginx.conf
input {
  kafka {
    type => "kafka"
    bootstrap_servers => "192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092"
    topics => "nginx"
    group_id => "logstash"
    consumer_threads => 2
  }
}
output {
  elasticsearch {
    hosts => ["192.168.0.1","192.168.0.2","192.168.0.3"]
    port => "9300"
    index => "nginx-%{+YYYY.MM.dd}"
  }
}

Start Logstash:

$ ./bin/logstash -f nginx.conf

Elasticsearch

Download, extract, and configure the cluster:

$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.2.4.tar.gz
$ tar -zxvf elasticsearch-6.2.4.tar.gz
$ mv elasticsearch-6.2.4 elasticsearch
$ vim config/elasticsearch.yml
cluster.name: es
node.name: es-node1
network.host: 192.168.0.1
discovery.zen.ping.unicast.hosts: ["192.168.0.1"]
discovery.zen.minimum_master_nodes: 1

Start Elasticsearch in background: $ ./bin/elasticsearch -d Verify by accessing http://192.168.0.1:9200/ and checking the JSON response.

Kibana

Download, extract, and configure:

$ wget https://artifacts.elastic.co/downloads/kibana/kibana-6.2.4-darwin-x86_64.tar.gz
$ tar -zxvf kibana-6.2.4-darwin-x86_64.tar.gz
$ mv kibana-6.2.4-darwin-x86_64 kibana
$ vim config/kibana.yml
server.port: 5601
server.host: "192.168.0.1"
elasticsearch.url: "http://192.168.0.1:9200"

Start Kibana: $ nohup ./bin/kibana & In Kibana, create an index pattern (e.g., nginx-*) under Management → Index Patterns to visualize logs.

Key Operational Tips

Separate Master and Data nodes; when Data nodes exceed three, isolate responsibilities.

Limit each Data node memory to 31 GB (max 32 GB) for optimal JVM performance.

Set discovery.zen.minimum_master_nodes to (total_nodes / 2) + 1 to avoid split‑brain.

Never expose Elasticsearch directly to the public internet; install X‑Pack for security.

By following these steps, you can deploy a complete ELK stack capable of ingesting, processing, indexing, and visualizing billions of log entries per day, with horizontal scaling of Kafka and Elasticsearch clusters for high‑throughput real‑time analysis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Operations Elasticsearch Kafka ELK Logstash Kibana filebeat

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.