How to Build a TB‑Scale Centralized Log System with ELK Stack and Filebeat
This guide walks you through deploying a production‑grade ELK Stack with Filebeat for enterprise‑level log centralization, covering environment prerequisites, Docker/Kubernetes setups, configuration of Elasticsearch, Kibana, Filebeat, index lifecycle management, monitoring, alerting, performance tuning, backup, and troubleshooting for TB‑scale daily logs.
Applicable Scenarios and Prerequisites
Enterprise log centralization, security audit, fault diagnosis, performance analysis.
Prerequisites: 3+ servers, Docker/Kubernetes or bare‑metal, 100 GB+ storage, network connectivity.
Supports TB‑scale daily logs, millions of log lines per second, 30‑day hot storage and cold‑storage archiving.
Environment and Version Matrix
Elasticsearch: recommended 7.17+ / 8.x, minimum 7.10, cluster deployment, 100 GB+ SSD.
Kibana: recommended 7.17+ / 8.x, minimum 7.10, single‑node or HA, 10 GB+ storage.
Filebeat: recommended 7.17+ / 8.x, minimum 7.10, DaemonSet/agent, 5 GB+ buffer.
Logstash (optional): recommended 7.17+ / 8.x, minimum 7.10, cluster deployment, 20 GB+ buffer.
OS: RHEL 8.x / Ubuntu 20.04 (minimum RHEL 7.x), Linux kernel 4.18+.
Java: OpenJDK 11+ (minimum 8+).
Quick Checklist
Deploy Elasticsearch cluster (3 nodes).
Deploy Kibana UI.
Configure Filebeat to collect logs.
Define index templates and lifecycle policies.
Configure log parsing and field extraction (Grok).
Set up alerting rules and notifications.
Build Kibana dashboards and visualizations.
Configure backup and disaster recovery.
Step 1: Deploy Elasticsearch Cluster
Docker Compose (recommended quick start):
mkdir -p ~/elk-stack && cd ~/elk-stack
cat > docker-compose.yml <<'EOF'
version: '3.8'
services:
# Elasticsearch cluster (3 nodes)
es01:
image: docker.elastic.co/elasticsearch/elasticsearch:8.5.0
container_name: es01
environment:
- node.name=es01
- cluster.name=elk-cluster
- discovery.seed_hosts=es02,es03
- cluster.initial_master_nodes=es01,es02,es03
- xpack.security.enabled=false
- ES_JAVA_OPTS=-Xms2g -Xmx2g
volumes:
- esdata01:/usr/share/elasticsearch/data
ports:
- "9200:9200"
networks:
- elk
es02:
image: docker.elastic.co/elasticsearch/elasticsearch:8.5.0
container_name: es02
environment:
- node.name=es02
- cluster.name=elk-cluster
- discovery.seed_hosts=es01,es03
- cluster.initial_master_nodes=es01,es02,es03
- xpack.security.enabled=false
- ES_JAVA_OPTS=-Xms2g -Xmx2g
volumes:
- esdata02:/usr/share/elasticsearch/data
networks:
- elk
es03:
image: docker.elastic.co/elasticsearch/elasticsearch:8.5.0
container_name: es03
environment:
- node.name=es03
- cluster.name=elk-cluster
- discovery.seed_hosts=es01,es02
- cluster.initial_master_nodes=es01,es02,es03
- xpack.security.enabled=false
- ES_JAVA_OPTS=-Xms2g -Xmx2g
volumes:
- esdata03:/usr/share/elasticsearch/data
networks:
- elk
# Kibana
kibana:
image: docker.elastic.co/kibana/kibana:8.5.0
container_name: kibana
environment:
- ELASTICSEARCH_HOSTS=http://es01:9200
ports:
- "5601:5601"
depends_on:
- es01
networks:
- elk
volumes:
esdata01:
esdata02:
esdata03:
networks:
elk:
driver: bridge
EOF
docker-compose up -d
sleep 30
curl http://localhost:9200/_cluster/health?pretty
EOFExpected output (cluster health):
{
"cluster_name": "elk-cluster",
"status": "green",
"timed_out": false,
"number_of_nodes": 3,
"number_of_data_nodes": 3,
"active_primary_shards": 1,
"active_shards": 2,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100.0
}Step 2: Deploy Kibana
Access Kibana at http://localhost:5601 and follow the initial setup wizard to create an index pattern (e.g., filebeat-*) and select @timestamp as the time field.
Step 3: Deploy Filebeat
RHEL/CentOS installation:
curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.5.0-x86_64.rpm
sudo rpm -vi filebeat-8.5.0-x86_64.rpmUbuntu/Debian installation:
curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.5.0-amd64.deb
sudo dpkg -i filebeat-8.5.0-amd64.debConfigure /etc/filebeat/filebeat.yml (sample excerpt):
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/nginx/access.log
- /var/log/nginx/error.log
- /var/log/app/*.log
multiline.pattern: '^\['
multiline.negate: true
multiline.match: after
fields:
hostname: web01
service: nginx
environment: production
harvester_buffer_size: 16384
close_inactive: 5m
processors:
- add_host_metadata: ~
- add_docker_metadata: ~
- grok:
field: message
patterns:
- "%{NGINX_ACCESS}"
pattern_definitions:
NGINX_ACCESS: "%{IPORHOST:client_ip} %{DATA:ident} %{DATA:user} \[%{HTTPDATE:timestamp}\] \"%{WORD:method} %{DATA:request} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code:int} %{NUMBER:bytes:int} %{QS:referrer} %{QS:user_agent} (?:%{NUMBER:response_time:float}|-) (?:%{NUMBER:upstream_time:float}|-)"
output.elasticsearch:
hosts: ["elasticsearch:9200"]
index: "filebeat-%{+yyyy.MM.dd}"
ilm.enabled: true
ilm.policy_name: "filebeat"
ilm.rollover_alias: "filebeat"
logging.level: info
logging.to_files: true
logging.files:
path: /var/log/filebeat
name: filebeat
keepfiles: 7
permissions: 0644Start and enable Filebeat:
sudo systemctl start filebeat
sudo systemctl enable filebeatStep 4: Index Templates and ILM Policies
Create an index template for Filebeat indices:
curl -X PUT "http://localhost:9200/_index_template/filebeat" -H 'Content-Type: application/json' -d '{
"index_patterns": ["filebeat-*"],
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"index.lifecycle.name": "filebeat-policy"
},
"mappings": {
"properties": {
"timestamp": {"type": "date"},
"service": {"type": "keyword"},
"level": {"type": "keyword"},
"message": {"type": "text"}
}
}
}'Create the ILM policy (hot, warm, cold, delete phases):
curl -X PUT "http://localhost:9200/_ilm/policy/filebeat-policy" -H 'Content-Type: application/json' -d '{
"policy": "filebeat-policy",
"phases": {
"hot": {"min_age": "0d", "actions": {"rollover": {"max_primary_shard_size": "50GB", "max_age": "1d"}}},
"warm": {"min_age": "3d", "actions": {"set_priority": {"priority": 50}, "forcemerge": {"max_num_segments": 1}}},
"cold": {"min_age": "30d", "actions": {"searchable_snapshot": {}}},
"delete": {"min_age": "90d", "actions": {"delete": {}}}
}
}'Step 5: Log Parsing and Field Extraction
Example Grok pattern for Nginx access logs:
%{IPORHOST:client_ip} %{DATA:ident} %{DATA:user} \[%{HTTPDATE:timestamp}\] "%{WORD:method} %{DATA:request} HTTP/%{NUMBER:http_version}" %{NUMBER:response_code:int} %{NUMBER:bytes:int} %{QS:referrer} %{QS:user_agent} %{NUMBER:response_time:float}Configure the same pattern in Filebeat processors.grok section (see Step 3 configuration).
Step 6: Alerting Rules
In Kibana, create an alert rule based on an Elasticsearch query, e.g., Nginx 5xx error rate in the last 5 minutes exceeding 10 occurrences:
GET filebeat-*/_search
{
"query": {
"bool": {
"must": [
{"match": {"response_code": "5xx"}},
{"range": {"@timestamp": {"gte": "now-5m"}}}
]
}
}
}Configure notification channels such as Slack, Email, PagerDuty, etc.
Step 7: Build Kibana Dashboards
Create visualizations for request rate (QPS), response time percentiles (P50/P95/P99), error rate, status‑code distribution, and geo‑location of requests, then add them to a dashboard and save.
Step 8: Backup and Disaster Recovery
Create a snapshot repository:
# Create storage directory
sudo mkdir -p /mnt/elasticsearch-backup
sudo chown elasticsearch:elasticsearch /mnt/elasticsearch-backup
# Add to elasticsearch.yml
path.repo: /mnt/elasticsearch-backup
# Restart Elasticsearch
sudo systemctl restart elasticsearch
# Register repository
curl -X PUT "http://localhost:9200/_snapshot/backup" -H 'Content-Type: application/json' -d '{"type": "fs", "settings": {"location": "/mnt/elasticsearch-backup"}}'Register a daily snapshot policy (keep 30 days):
curl -X PUT "http://localhost:9200/_slm/policy/daily-backup" -H 'Content-Type: application/json' -d '{
"schedule": "0 2 * * *",
"repository": "backup",
"name": "<daily-{now/d}>",
"indices": ["filebeat-*"],
"retention": {"expire_after": "30d", "min_count": 7, "max_count": 30}
}'Monitoring and Alerting
Key monitoring metrics (can be scraped by Prometheus):
elasticsearch.cluster.health.status
elasticsearch.indices.docs.count
elasticsearch.indices.store.size
elasticsearch.nodes.process.cpu.percent
elasticsearch.jvm.memory.heap.percent
filebeat.registry.state.entriesPrometheus scrape config example for the Elasticsearch exporter:
scrape_configs:
- job_name: 'elasticsearch'
static_configs:
- targets: ['localhost:9100'] # elasticsearch_exporterPerformance Optimization
Elasticsearch: increase heap (e.g., ES_JAVA_OPTS=-Xms8g -Xmx8g), set shard count equal to node count, enable ILM, tune refresh_interval for heavy writes.
Filebeat: raise batch_size (e.g., 2048), adjust scan_frequency (e.g., 10s), enable multiline buffering for stack traces.
Best Practices
Redundant backups: retain at least 30 days of snapshots and periodically copy to off‑site storage (S3/NAS).
Use ILM to automatically move hot, warm, and cold data, controlling storage costs.
Tiered alerting: critical errors trigger immediate alerts; warnings are aggregated into hourly reports.
Log sanitization: mask passwords, keys, and other sensitive fields via processors.
Enable X‑Pack security to restrict access to sensitive indices.
Regularly clean up old indices through ILM deletion phase.
Monitor Filebeat collection latency to ensure timely log ingestion.
Common Issues and Troubleshooting
Filebeat cannot connect to Elasticsearch: check output.elasticsearch.hosts, network connectivity, and firewall rules.
Index not created: verify Filebeat is running and output configuration is correct.
Elasticsearch OOM: increase JVM heap or delete/close unused indices.
Slow queries: reduce index size, optimize mappings, or move older data to the cold phase via ILM.
Grok parsing failures: test patterns in Kibana Grok Debugger and adjust to match log format.
Conclusion: ELK Stack with Filebeat provides an enterprise‑grade, TB‑scale log management solution. By following the deployment checklist, configuring ILM, setting up alerts, and applying performance tweaks, you can achieve reliable log collection, fast search, visual analytics, and robust data protection.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
