Operations 19 min read

How to Build a TB‑Scale Centralized Log System with ELK Stack and Filebeat

This guide walks you through deploying a production‑grade ELK Stack with Filebeat for enterprise‑level log centralization, covering environment prerequisites, Docker/Kubernetes setups, configuration of Elasticsearch, Kibana, Filebeat, index lifecycle management, monitoring, alerting, performance tuning, backup, and troubleshooting for TB‑scale daily logs.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How to Build a TB‑Scale Centralized Log System with ELK Stack and Filebeat

Applicable Scenarios and Prerequisites

Enterprise log centralization, security audit, fault diagnosis, performance analysis.

Prerequisites: 3+ servers, Docker/Kubernetes or bare‑metal, 100 GB+ storage, network connectivity.

Supports TB‑scale daily logs, millions of log lines per second, 30‑day hot storage and cold‑storage archiving.

Environment and Version Matrix

Elasticsearch: recommended 7.17+ / 8.x, minimum 7.10, cluster deployment, 100 GB+ SSD.

Kibana: recommended 7.17+ / 8.x, minimum 7.10, single‑node or HA, 10 GB+ storage.

Filebeat: recommended 7.17+ / 8.x, minimum 7.10, DaemonSet/agent, 5 GB+ buffer.

Logstash (optional): recommended 7.17+ / 8.x, minimum 7.10, cluster deployment, 20 GB+ buffer.

OS: RHEL 8.x / Ubuntu 20.04 (minimum RHEL 7.x), Linux kernel 4.18+.

Java: OpenJDK 11+ (minimum 8+).

Quick Checklist

Deploy Elasticsearch cluster (3 nodes).

Deploy Kibana UI.

Configure Filebeat to collect logs.

Define index templates and lifecycle policies.

Configure log parsing and field extraction (Grok).

Set up alerting rules and notifications.

Build Kibana dashboards and visualizations.

Configure backup and disaster recovery.

Step 1: Deploy Elasticsearch Cluster

Docker Compose (recommended quick start):

mkdir -p ~/elk-stack && cd ~/elk-stack
cat > docker-compose.yml <<'EOF'
version: '3.8'
services:
  # Elasticsearch cluster (3 nodes)
  es01:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.5.0
    container_name: es01
    environment:
      - node.name=es01
      - cluster.name=elk-cluster
      - discovery.seed_hosts=es02,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - xpack.security.enabled=false
      - ES_JAVA_OPTS=-Xms2g -Xmx2g
    volumes:
      - esdata01:/usr/share/elasticsearch/data
    ports:
      - "9200:9200"
    networks:
      - elk

  es02:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.5.0
    container_name: es02
    environment:
      - node.name=es02
      - cluster.name=elk-cluster
      - discovery.seed_hosts=es01,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - xpack.security.enabled=false
      - ES_JAVA_OPTS=-Xms2g -Xmx2g
    volumes:
      - esdata02:/usr/share/elasticsearch/data
    networks:
      - elk

  es03:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.5.0
    container_name: es03
    environment:
      - node.name=es03
      - cluster.name=elk-cluster
      - discovery.seed_hosts=es01,es02
      - cluster.initial_master_nodes=es01,es02,es03
      - xpack.security.enabled=false
      - ES_JAVA_OPTS=-Xms2g -Xmx2g
    volumes:
      - esdata03:/usr/share/elasticsearch/data
    networks:
      - elk

  # Kibana
  kibana:
    image: docker.elastic.co/kibana/kibana:8.5.0
    container_name: kibana
    environment:
      - ELASTICSEARCH_HOSTS=http://es01:9200
    ports:
      - "5601:5601"
    depends_on:
      - es01
    networks:
      - elk

volumes:
  esdata01:
  esdata02:
  esdata03:

networks:
  elk:
    driver: bridge
EOF

docker-compose up -d
sleep 30
curl http://localhost:9200/_cluster/health?pretty
EOF

Expected output (cluster health):

{
  "cluster_name": "elk-cluster",
  "status": "green",
  "timed_out": false,
  "number_of_nodes": 3,
  "number_of_data_nodes": 3,
  "active_primary_shards": 1,
  "active_shards": 2,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 0,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 0,
  "active_shards_percent_as_number": 100.0
}

Step 2: Deploy Kibana

Access Kibana at http://localhost:5601 and follow the initial setup wizard to create an index pattern (e.g., filebeat-*) and select @timestamp as the time field.

Step 3: Deploy Filebeat

RHEL/CentOS installation:

curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.5.0-x86_64.rpm
sudo rpm -vi filebeat-8.5.0-x86_64.rpm

Ubuntu/Debian installation:

curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.5.0-amd64.deb
sudo dpkg -i filebeat-8.5.0-amd64.deb

Configure /etc/filebeat/filebeat.yml (sample excerpt):

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/nginx/access.log
    - /var/log/nginx/error.log
    - /var/log/app/*.log
  multiline.pattern: '^\['
  multiline.negate: true
  multiline.match: after
  fields:
    hostname: web01
    service: nginx
    environment: production
  harvester_buffer_size: 16384
  close_inactive: 5m

processors:
  - add_host_metadata: ~
  - add_docker_metadata: ~
  - grok:
      field: message
      patterns:
        - "%{NGINX_ACCESS}"
      pattern_definitions:
        NGINX_ACCESS: "%{IPORHOST:client_ip} %{DATA:ident} %{DATA:user} \[%{HTTPDATE:timestamp}\] \"%{WORD:method} %{DATA:request} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code:int} %{NUMBER:bytes:int} %{QS:referrer} %{QS:user_agent} (?:%{NUMBER:response_time:float}|-) (?:%{NUMBER:upstream_time:float}|-)"

output.elasticsearch:
  hosts: ["elasticsearch:9200"]
  index: "filebeat-%{+yyyy.MM.dd}"
  ilm.enabled: true
  ilm.policy_name: "filebeat"
  ilm.rollover_alias: "filebeat"

logging.level: info
logging.to_files: true
logging.files:
  path: /var/log/filebeat
  name: filebeat
  keepfiles: 7
  permissions: 0644

Start and enable Filebeat:

sudo systemctl start filebeat
sudo systemctl enable filebeat

Step 4: Index Templates and ILM Policies

Create an index template for Filebeat indices:

curl -X PUT "http://localhost:9200/_index_template/filebeat" -H 'Content-Type: application/json' -d '{
  "index_patterns": ["filebeat-*"],
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "index.lifecycle.name": "filebeat-policy"
  },
  "mappings": {
    "properties": {
      "timestamp": {"type": "date"},
      "service": {"type": "keyword"},
      "level": {"type": "keyword"},
      "message": {"type": "text"}
    }
  }
}'

Create the ILM policy (hot, warm, cold, delete phases):

curl -X PUT "http://localhost:9200/_ilm/policy/filebeat-policy" -H 'Content-Type: application/json' -d '{
  "policy": "filebeat-policy",
  "phases": {
    "hot": {"min_age": "0d", "actions": {"rollover": {"max_primary_shard_size": "50GB", "max_age": "1d"}}},
    "warm": {"min_age": "3d", "actions": {"set_priority": {"priority": 50}, "forcemerge": {"max_num_segments": 1}}},
    "cold": {"min_age": "30d", "actions": {"searchable_snapshot": {}}},
    "delete": {"min_age": "90d", "actions": {"delete": {}}}
  }
}'

Step 5: Log Parsing and Field Extraction

Example Grok pattern for Nginx access logs:

%{IPORHOST:client_ip} %{DATA:ident} %{DATA:user} \[%{HTTPDATE:timestamp}\] "%{WORD:method} %{DATA:request} HTTP/%{NUMBER:http_version}" %{NUMBER:response_code:int} %{NUMBER:bytes:int} %{QS:referrer} %{QS:user_agent} %{NUMBER:response_time:float}

Configure the same pattern in Filebeat processors.grok section (see Step 3 configuration).

Step 6: Alerting Rules

In Kibana, create an alert rule based on an Elasticsearch query, e.g., Nginx 5xx error rate in the last 5 minutes exceeding 10 occurrences:

GET filebeat-*/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"response_code": "5xx"}},
        {"range": {"@timestamp": {"gte": "now-5m"}}}
      ]
    }
  }
}

Configure notification channels such as Slack, Email, PagerDuty, etc.

Step 7: Build Kibana Dashboards

Create visualizations for request rate (QPS), response time percentiles (P50/P95/P99), error rate, status‑code distribution, and geo‑location of requests, then add them to a dashboard and save.

Step 8: Backup and Disaster Recovery

Create a snapshot repository:

# Create storage directory
sudo mkdir -p /mnt/elasticsearch-backup
sudo chown elasticsearch:elasticsearch /mnt/elasticsearch-backup
# Add to elasticsearch.yml
path.repo: /mnt/elasticsearch-backup
# Restart Elasticsearch
sudo systemctl restart elasticsearch
# Register repository
curl -X PUT "http://localhost:9200/_snapshot/backup" -H 'Content-Type: application/json' -d '{"type": "fs", "settings": {"location": "/mnt/elasticsearch-backup"}}'

Register a daily snapshot policy (keep 30 days):

curl -X PUT "http://localhost:9200/_slm/policy/daily-backup" -H 'Content-Type: application/json' -d '{
  "schedule": "0 2 * * *",
  "repository": "backup",
  "name": "<daily-{now/d}>",
  "indices": ["filebeat-*"],
  "retention": {"expire_after": "30d", "min_count": 7, "max_count": 30}
}'

Monitoring and Alerting

Key monitoring metrics (can be scraped by Prometheus):

elasticsearch.cluster.health.status
elasticsearch.indices.docs.count
elasticsearch.indices.store.size
elasticsearch.nodes.process.cpu.percent
elasticsearch.jvm.memory.heap.percent
filebeat.registry.state.entries

Prometheus scrape config example for the Elasticsearch exporter:

scrape_configs:
  - job_name: 'elasticsearch'
    static_configs:
      - targets: ['localhost:9100']  # elasticsearch_exporter

Performance Optimization

Elasticsearch: increase heap (e.g., ES_JAVA_OPTS=-Xms8g -Xmx8g), set shard count equal to node count, enable ILM, tune refresh_interval for heavy writes.

Filebeat: raise batch_size (e.g., 2048), adjust scan_frequency (e.g., 10s), enable multiline buffering for stack traces.

Best Practices

Redundant backups: retain at least 30 days of snapshots and periodically copy to off‑site storage (S3/NAS).

Use ILM to automatically move hot, warm, and cold data, controlling storage costs.

Tiered alerting: critical errors trigger immediate alerts; warnings are aggregated into hourly reports.

Log sanitization: mask passwords, keys, and other sensitive fields via processors.

Enable X‑Pack security to restrict access to sensitive indices.

Regularly clean up old indices through ILM deletion phase.

Monitor Filebeat collection latency to ensure timely log ingestion.

Common Issues and Troubleshooting

Filebeat cannot connect to Elasticsearch: check output.elasticsearch.hosts, network connectivity, and firewall rules.

Index not created: verify Filebeat is running and output configuration is correct.

Elasticsearch OOM: increase JVM heap or delete/close unused indices.

Slow queries: reduce index size, optimize mappings, or move older data to the cold phase via ILM.

Grok parsing failures: test patterns in Kibana Grok Debugger and adjust to match log format.

Conclusion: ELK Stack with Filebeat provides an enterprise‑grade, TB‑scale log management solution. By following the deployment checklist, configuring ILM, setting up alerts, and applying performance tweaks, you can achieve reliable log collection, fast search, visual analytics, and robust data protection.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DockerKubernetesELKFilebeat
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.