Why ELK Is the Ultimate Solution for Log Management and Monitoring
This article introduces the ELK stack—Elasticsearch, Logstash, and Kibana—explaining its core components, architecture, comparison with databases and grep, typical use cases across security, networking, and application monitoring, deployment considerations, challenges, SaaS prospects, and recommended learning resources.
User Stories
Scenario 1
As an operations engineer, you may need to check logs across virtual machines, physical hosts, and cloud platforms without logging into each host individually; a centralized log system would simplify troubleshooting and allow alert subscriptions.
Scenario 2
Developers often need to trace API calls and database interactions; a tool that provides dashboards showing request counts and failures can avoid costly grep operations and I/O spikes.
Scenario 3
When a new version is released, it is useful to compare pre‑ and post‑deployment logs to determine whether incidents are related to the new release.
Scenario 4
Team leaders want visibility into product usage, feature access frequency, and error rates without manually running analysis scripts on distributed clusters.
All these problems can be solved with ELK.
What Is ELK?
In short, if logs are buried treasure, ELK is the excavator.
Overview
ELK is a solution composed of three products: Elasticsearch for storage and search, Logstash for collection, filtering, and formatting, and Kibana for visualization and dashboards. This article focuses on Elasticsearch (es).
Related Architecture Concepts
One node with 2 replicas and 3 shards.
A cluster consists of multiple nodes.
Data is indexed and stored in an index (similar to a DB in RDBMS).
An index can be split into multiple shards, each shard can have multiple replicas.
Node types: master, data, client. One node is elected master to maintain cluster state.
Shards are evenly distributed across available data nodes.
ES vs Relational Databases
Elasticsearch can be viewed as a database with a built‑in search engine. The following table compares key concepts with MySQL.
ELK vs Linux Grep
What Can ELK Do?
Application Scenarios
Security: Analyze system logs to detect attacks or illegal access, e.g., visualizing brute‑force attempts with FreeIPA.
Network: Complement SNMP‑based monitoring by analyzing syslog data, capturing events like port flapping or engine failures.
Application: Real‑time analysis of mobile traffic, API request volume, website visits, and performance metrics for capacity planning.
Other: User profiling for social engineering, stack trace analysis, network traffic analysis.
ELK Deployment Patterns
Architecture Selection
A common ELK architecture is shown below.
This design is simple and easy to maintain, but has drawbacks:
Shippers consume host resources; Logstash as an agent is heavy, so Beats are recommended.
Kibana’s built‑in access control is weak; consider Elasticsearch Search Guard + LDAP + Nginx for security.
Cross‑network data transfer can saturate bandwidth; a solution is to deploy separate ELK clusters per data center and use tribe nodes for query routing.
To address these issues, the following architecture can be used:
If log volume grows further, replace Logstash with Hangout and Redis with Kafka for higher throughput.
Monitoring and Alerting
Log Alerts
ElastAlert can be used, or custom applications can pull data from Elasticsearch or Kafka for analysis.
Self‑Monitoring
Use Zabbix templates for monitoring.
Official Marvel plugin (paid) provides metrics.
OpenFalcon can monitor Elasticsearch clusters.
Challenges and Ideas
SaaS‑ification
Providing ELK as a SaaS service (e.g., on Sina Cloud, QingCloud, AWS) removes the need for users to build and maintain clusters, reducing cost and adding value for cloud providers.
Big Data Analytics
Storing massive log data in ELK enables downstream big‑data and machine‑learning analysis for intelligent operations.
Recommended Resources
"Elasticsearch: The Definitive Guide"
"ELK 中文指南"
"Mastering Elasticsearch"
"Manning Elasticsearch in Action"
Source: Zhihu article https://zhuanlan.zhihu.com/p/22400290
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
