How to Build a Billion-Scale ELK Log Platform with Filebeat, Kafka, and Elasticsearch
Learn step‑by‑step how to design and deploy a billion‑scale log collection and analysis platform using the ELK stack—Filebeat, Kafka, Logstash, Elasticsearch, and Kibana—covering architecture, configuration, installation, and best practices for high‑availability and performance.
Overall Architecture
The platform consists of four modules: Filebeat, Kafka, Logstash, and Elasticsearch, each providing specific functions.
Filebeat : lightweight data collector, replacement for Logstash‑forwarder.
Kafka : message queue for buffering and decoupling, ensuring scalability and handling traffic spikes.
Logstash : data processing engine that ingests, filters, enriches, and formats logs before storage.
Elasticsearch : distributed search engine for full‑text, structured, and analytical queries.
Filebeat: 6.2.4</code>
<code>Kafka: 2.11-1</code>
<code>Logstash: 6.2.4</code>
<code>Elasticsearch: 6.2.4</code>
<code>Kibana: 6.2.4Specific Implementation (Nginx JSON logs)
Example Nginx log entries in JSON format are shown.
{"@timestamp":"2017-12-27T16:38:17+08:00","host":"192.168.56.11","clientip":"192.168.56.11","size":26,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"192.168.56.11","url":"/nginxweb/index.html","domain":"192.168.56.11","xff":"-","referer":"-","status":"200"}Filebeat
Filebeat is used instead of Logstash‑forwarder because it consumes fewer resources; it runs as a Go‑based lightweight agent deployed on each application server, often installed via Salt.
Download
$ wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.2.4-darwin-x86_64.tar.gzExtract
tar -zxvf filebeat-6.2.4-darwin-x86_64.tar.gz
mv filebeat-6.2.4-darwin-x86_64 filebeat
cd filebeatConfiguration
$ vim filebeat.yml
filebeat.prospectors:
- input_type: log
paths:
- /opt/logs/server/nginx.log
json.keys_under_root: true
json.add_error_key: true
json.message_key: log
output.kafka:
hosts: ["192.168.0.1:9092","192.168.0.2:9092","192.168.0.3:9092"]
topic: 'nginx'Start Filebeat:
$ ./filebeat -e -c filebeat.ymlKafka
Deploy a three‑node Kafka cluster (2N+1 rule) and a Zookeeper ensemble.
Download
$ wget http://mirror.bit.edu.cn/apache/kafka/1.0.0/kafka_2.11-1.0.0.tgzExtract
tar -zxvf kafka_2.11-1.0.0.tgz
mv kafka_2.11-1.0.0 kafka
cd kafkaZookeeper configuration
$ vim zookeeper.properties
tickTime=2000
dataDir=/opt/zookeeper
clientPort=2181
maxClientCnxns=50
initLimit=10
syncLimit=5
server.1=192.168.0.1:2888:3888
server.2=192.168.0.2:2888:3888
server.3=192.168.0.3:2888:3888Create /opt/zookeeper/myid with node id (1,2,3) and start each Zookeeper node:
$ ./zookeeper-server-start.sh -daemon ./config/zookeeper.propertiesKafka broker configuration
$ vim ./config/server.properties
broker.id=1
port=9092
host.name=192.168.0.1
num.replica.fetchers=1
log.dirs=/opt/kafka_logs
num.partitions=3
zookeeper.connect=192.168.0.1:2181,192.168.0.2:2181,192.168.0.3:2181
zookeeper.connection.timeout.ms=6000
zookeeper.sync.time.ms=2000
num.io.threads=8
num.network.threads=8
queued.max.requests=16
fetch.purgatory.purge.interval.requests=100
producer.purgatory.purge.interval.requests=100
delete.topic.enable=trueStart each broker:
$ ./bin/kafka-server-start.sh -daemon ./config/server.propertiesVerify topic creation:
$ bin/kafka-topics.sh --list --zookeeper localhost:2181
nginxMonitor with Kafka‑Manager (open‑source tool from Yahoo).
Logstash
Logstash provides INPUT, FILTER, and OUTPUT stages. Use Grok debugger for parsing.
Download
$ wget https://artifacts.elastic.co/downloads/logstash/logstash-6.2.4.tar.gzExtract
tar -zxvf logstash-6.2.4.tar.gz
mv logstash-6.2.4 logstashConfiguration (nginx.conf)
input {
kafka {
type => "kafka"
bootstrap_servers => "192.168.0.1:2181,192.168.0.2:2181,192.168.0.3:2181"
topics => "nginx"
group_id => "logstash"
consumer_threads => 2
}
}
output {
elasticsearch {
host => ["192.168.0.1","192.168.0.2","192.168.0.3"]
port => "9300"
index => "nginx-%{+YYYY.MM.dd}"
}
}Start Logstash:
$ ./bin/logstash -f nginx.confElasticsearch
Download, extract, and configure the cluster.
Download
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.2.4.tar.gzExtract
tar -zxvf elasticsearch-6.2.4.tar.gz
mv elasticsearch-6.2.4 elasticsearchConfiguration (elasticsearch.yml)
cluster.name: es
node.name: es-node1
network.host: 192.168.0.1
discovery.zen.ping.unicast.hosts: ["192.168.0.1"]
discovery.zen.minimum_master_nodes: 1Start in background: $ ./bin/elasticsearch -d Verify by opening http://192.168.0.1:9200/ and checking the JSON response.
Key operational notes:
Separate master and data nodes; keep data node memory ≤31 GB.
Set discovery.zen.minimum_master_nodes to (total/2)+1 to avoid split‑brain.
Do not expose Elasticsearch to the public internet; enable X‑Pack for security.
Kibana
Download, extract, configure, and launch Kibana for visualization.
Download
$ wget https://artifacts.elastic.co/downloads/kibana/kibana-6.2.4-darwin-x86_64.tar.gzExtract
tar -zxvf kibana-6.2.4-darwin-x86_64.tar.gz
mv kibana-6.2.4-darwin-x86_64 kibanaConfiguration (kibana.yml)
server.port: 5601
server.host: "192.168.0.1"
elasticsearch.url: "http://192.168.0.1:9200"Start Kibana: $ nohup ./bin/kibana & Create index patterns in Management → Index Patterns using the nginx-* prefix.
Conclusion
By following the commands above you can deploy a complete ELK pipeline that handles log collection, filtering, indexing, and visualization, and by horizontally scaling Kafka and Elasticsearch you can achieve daily processing of billions of log entries in real time.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
