Build a Billion-Scale ELK Logging Platform with Filebeat, Kafka, Elasticsearch
This guide walks through the complete design and step‑by‑step deployment of a billion‑scale ELK logging platform, covering architecture, component roles, version selection, configuration files, and command‑line installation for Filebeat, Kafka, Logstash, Elasticsearch, and Kibana.
Overall Architecture
The logging platform consists of four main modules: Filebeat for lightweight log collection, Kafka as a buffering queue, Logstash for data processing, and Elasticsearch as a distributed search engine. Kibana provides visualization.
Filebeat: 6.2.4
Kafka: 2.11-1
Logstash: 6.2.4
Elasticsearch: 6.2.4
Kibana: 6.2.4Filebeat
Filebeat replaces the older Logstash‑forwarder with a Go‑based lightweight agent. It is deployed on each application server, often via Salt.
Download and extract:
$ wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.2.4-darwin-x86_64.tar.gz
$ tar -zxvf filebeat-6.2.4-darwin-x86_64.tar.gz
$ mv filebeat-6.2.4-darwin-x86_64 filebeat
$ cd filebeatConfiguration (collect Nginx JSON logs and send to Kafka):
$ vim filebeat.yml
filebeat.prospectors:
- input_type: log
paths:
- /opt/logs/server/nginx.log
json.keys_under_root: true
json.add_error_key: true
json.message_key: log
output.kafka:
hosts: ["192.168.0.1:9092","192.168.0.2:9092","192.168.0.3:9092"]
topic: "nginx"Start Filebeat:
$ ./filebeat -e -c filebeat.ymlKafka
Deploy a 3‑node Kafka cluster (2N+1 recommendation). Install and configure Zookeeper first.
Download and extract:
$ wget http://mirror.bit.edu.cn/apache/kafka/1.0.0/kafka_2.11-1.0.0.tgz
$ tar -zxvf kafka_2.11-1.0.0.tgz
$ mv kafka_2.11-1.0.0 kafka
$ cd kafkaZookeeper configuration (3 nodes):
$ vim zookeeper.properties
tickTime=2000
dataDir=/opt/zookeeper
clientPort=2181
maxClientCnxns=50
initLimit=10
syncLimit=5
server.1=192.168.0.1:2888:3888
server.2=192.168.0.2:2888:3888
server.3=192.168.0.3:2888:3888Create myid file with node id (1,2,3) and start each Zookeeper instance:
$ ./zookeeper-server-start.sh -daemon ./config/zookeeper.propertiesKafka broker configuration (example for broker 1):
$ vim ./config/server.properties
broker.id=1
port=9092
host.name=192.168.0.1
log.dirs=/opt/kafka_logs
num.partitions=3
zookeeper.connect=192.168.0.1:2181,192.168.0.2:2181,192.168.0.3:2181Start each Kafka broker:
$ ./bin/kafka-server-start.sh -daemon ./config/server.propertiesVerify topic creation:
$ bin/kafka-topics.sh --list --zookeeper localhost:2181
nginxLogstash
Logstash provides INPUT, FILTER, and OUTPUT stages. Use a Grok debugger for filter patterns.
Download and extract:
$ wget https://artifacts.elastic.co/downloads/logstash/logstash-6.2.4.tar.gz
$ tar -zxvf logstash-6.2.4.tar.gz
$ mv logstash-6.2.4 logstashConfiguration (read from Kafka and write to Elasticsearch):
$ vim nginx.conf
input {
kafka {
type => "kafka"
bootstrap_servers => "192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092"
topics => "nginx"
group_id => "logstash"
consumer_threads => 2
}
}
output {
elasticsearch {
hosts => ["192.168.0.1","192.168.0.2","192.168.0.3"]
port => "9300"
index => "nginx-%{+YYYY.MM.dd}"
}
}Start Logstash:
$ ./bin/logstash -f nginx.confElasticsearch
Download, extract, and configure the cluster:
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.2.4.tar.gz
$ tar -zxvf elasticsearch-6.2.4.tar.gz
$ mv elasticsearch-6.2.4 elasticsearch
$ vim config/elasticsearch.yml
cluster.name: es
node.name: es-node1
network.host: 192.168.0.1
discovery.zen.ping.unicast.hosts: ["192.168.0.1"]
discovery.zen.minimum_master_nodes: 1Start Elasticsearch in background: $ ./bin/elasticsearch -d Verify by accessing http://192.168.0.1:9200/ and checking the JSON response.
Kibana
Download, extract, and configure:
$ wget https://artifacts.elastic.co/downloads/kibana/kibana-6.2.4-darwin-x86_64.tar.gz
$ tar -zxvf kibana-6.2.4-darwin-x86_64.tar.gz
$ mv kibana-6.2.4-darwin-x86_64 kibana
$ vim config/kibana.yml
server.port: 5601
server.host: "192.168.0.1"
elasticsearch.url: "http://192.168.0.1:9200"Start Kibana: $ nohup ./bin/kibana & In Kibana, create an index pattern (e.g., nginx-*) under Management → Index Patterns to visualize logs.
Key Operational Tips
Separate Master and Data nodes; when Data nodes exceed three, isolate responsibilities.
Limit each Data node memory to 31 GB (max 32 GB) for optimal JVM performance.
Set discovery.zen.minimum_master_nodes to (total_nodes / 2) + 1 to avoid split‑brain.
Never expose Elasticsearch directly to the public internet; install X‑Pack for security.
By following these steps, you can deploy a complete ELK stack capable of ingesting, processing, indexing, and visualizing billions of log entries per day, with horizontal scaling of Kafka and Elasticsearch clusters for high‑throughput real‑time analysis.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
