Operations 17 min read

Mastering EFK: The Complete Guide to Building a Scalable Log Management System

This comprehensive guide explains the EFK (Elasticsearch, Fluentd, Kibana) log management stack, covering its components, architecture, deployment steps, log collection strategies, index optimization, monitoring, security hardening, troubleshooting and best‑practice recommendations for building a reliable, scalable logging solution in modern cloud‑native environments.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Mastering EFK: The Complete Guide to Building a Scalable Log Management System

EFK Log Management Solution Overview

EFK (Elasticsearch, Fluentd, Kibana) provides a complete pipeline for collecting, storing, analyzing, and visualizing logs in micro‑service and cloud‑native environments.

EFK Architecture Components

Elasticsearch : distributed search and analytics engine for storing and retrieving log data.

Full‑text search built on Lucene

Horizontal scaling and high availability

Powerful aggregation queries

Fluentd : open‑source data collector that gathers, filters and forwards logs.

Unified log collection layer

Supports many input sources and outputs

Rich plugin ecosystem

Kibana : visualization platform for querying, analyzing and monitoring logs.

Intuitive web UI

Rich chart types and dashboards

Alerting capabilities

Technical Advantages

Unified log management : centralized collection and management of distributed logs.

Real‑time analysis : near‑real‑time search and analytics.

Visualization : intuitive charts and dashboards.

High availability : cluster deployment with failover.

Scalability : flexible expansion according to business needs.

System Architecture Design

Overall Architecture

Application Service → Fluentd Agent → Kafka/Redis → Fluentd Aggregator → Elasticsearch → Kibana

Layered Design

Data source layer : application logs, system logs, container logs, network device logs.

Collection layer : Fluentd agents on each node collect and pre‑filter logs.

Buffer layer : Kafka or Redis provides buffering and spike‑shaping.

Aggregation layer : Fluentd aggregator performs data cleaning, formatting and routing.

Storage layer : Elasticsearch cluster with index management and replication.

Presentation layer : Kibana visual interface with custom dashboards and alerts.

High‑Availability Design

Elasticsearch: multi‑node cluster, master‑data separation, replica configuration and automatic failover.

Fluentd: multiple instances, load‑balancing and retry mechanisms.

Kibana: multiple instances behind a load balancer with session persistence.

Environment Preparation and Deployment

System Requirements

Hardware : 8+ CPU cores, 16 GB+ RAM, 500 GB+ SSD, 1 Gbps network.

Software : CentOS 7/8 or Ubuntu 18.04/20.04, OpenJDK 11+, Docker 19+, optional Kubernetes 1.18+.

Elasticsearch Deployment

Single‑node :

# Download and install
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.15.0-linux-x86_64.tar.gz
tar -xzf elasticsearch-7.15.0-linux-x86_64.tar.gz
cd elasticsearch-7.15.0/

# Configuration (config/elasticsearch.yml)
cluster.name: efk-cluster
node.name: es-node-1
network.host: 0.0.0.0
http.port: 9200
discovery.type: single-node

# Start service
./bin/elasticsearch

Cluster (master and data nodes) – configuration snippets omitted for brevity. Performance‑tuning JVM options and system parameters (vm.max_map_count, fs.file‑max, etc.) are also provided.

Fluentd Deployment

Installation via official script or gem, required plugins, and a sample fluent.conf that defines a tail source, record transformer filter and Elasticsearch output. Docker‑compose example for running Fluentd as a container.

Kibana Deployment

Download, extract, configure kibana.yml (port, host, Elasticsearch hosts) and optional Docker compose.

Log Collection Strategies

Application Log Collection

File log collection example using tail source and nginx log parser.

Container log collection with Kubernetes metadata filter.

System Log Collection

Syslog source, record transformer, and system metrics collection via exec source.

Parsing and Formatting

JSON parser, regular‑expression parser for Apache logs, and custom parsing pipelines.

Index Management and Optimization

Index Policies

Index template with 3 shards, 1 replica, 30 s refresh interval and best‑compression codec.

ILM policy defining hot, warm, cold and delete phases.

Performance Optimizations

Shard size 10‑50 GB, shard count = data nodes × 1‑3.

Query optimization with filtered range queries.

Compression settings, memory buffer sizes and cache tuning.

Monitoring and Alerting

System Monitoring

Elasticsearch health, node stats and index stats via curl commands.

Fluentd monitoring agent and system log output.

Alert Configuration

Kibana Watcher example that triggers email when error logs exceed a threshold.

Performance Monitoring

Key metrics: indexing rate, query latency, heap usage, disk usage, network I/O. Sample Bash script for heap and disk warnings.

Security Configuration

Access Control

Elasticsearch security settings (xpack.security.enabled, TLS, user and role creation).

Network Security

Firewall rules for required ports and SSL/TLS configuration for Elasticsearch and Kibana.

Data Encryption

Transport‑layer TLS in Fluentd output and Elasticsearch encryption key.

Troubleshooting

Common Issues

Elasticsearch startup failures, Fluentd log collection problems and diagnostic commands.

Performance Issues

Enable slow‑query logging, adjust bulk size and index buffer.

Data Recovery

Snapshot repository creation and restore procedures.

Best Practices

Architecture Design

Layered deployment: lightweight agents, central aggregator, dedicated storage cluster, visualization layer.

Capacity planning based on log volume, retention period, query concurrency and growth.

Configuration Optimization

Production‑grade Elasticsearch settings (memory lock, index buffer, thread pool, discovery) and Fluentd worker and file buffer tuning.

Operational Guidelines

Standardized log format (JSON with timestamp, level, service, message, etc.).

Naming conventions for indices (app‑logs‑YYYY.MM.DD, system‑logs‑YYYY.MM.DD, etc.).

Monitoring key indicators and alert thresholds.

Conclusion

EFK provides a complete log handling capability for modern enterprises. Proper architecture, configuration, monitoring and security enable a reliable, scalable and secure log management platform that supports business growth and digital transformation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringDockerElasticsearchKubernetesLog ManagementKibanaFluentdEFK
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.