Operations 13 min read

Unlock Real-Time Log Analysis with ELK: From Basics to Advanced Practices

This article explores how the ELK stack can transform large‑scale log processing into fast, flexible, and interactive analysis for troubleshooting, security auditing, and monitoring, sharing practical examples, common pitfalls, and best‑practice recommendations from real‑world deployments at Sina.

Efficient Ops
Efficient Ops
Efficient Ops
Unlock Real-Time Log Analysis with ELK: From Basics to Advanced Practices

ELK Usage Scenarios

This article introduces the ELK suite for log handling, starting with an overview of its components and common use cases.

Why Logs Matter

Problem diagnosis – data‑driven operations.

Security auditing.

Monitoring.

Monitoring is the aggregation of health and performance data, events, and relationships delivered via an interface that provides a holistic view of a system’s state to better understand and address failure scenarios.

Effective log analysis must go beyond simply storing logs; it should enable rapid, interactive investigation.

Guest Introduction

Rao Chenlin, system architect at Sina Tech Assurance, Perl programmer, author of "Website Operations Technology and Practice", former at YunKuaiXian and Renren, focuses on CDN and automation, recently researching log processing and monitoring.

Application Examples

Typical application log (image omitted for brevity).

Logstash configuration example:

Kibana 3 dashboard screenshot:

Kibana 4 dashboard screenshot (improved performance, color tweaks):

These examples show how a few dozen Logstash lines can power diverse visualizations such as time‑series histograms and top‑N term charts.

Using ELK for PHP slow‑log analysis:

Resulting Kibana dashboard (clickable host filter):

Interactive filtering lets operators pinpoint problematic hosts and trace slow function calls.

Multi‑dimensional analysis of Nginx error logs:

ELK also supports crash‑log analysis, allowing developers to filter out system functions and focus on application‑specific stack traces.

Adding a version filter refines top‑N results for new releases.

Best Practices

At Sina, the ELK deployment handles 65 billion log entries over seven days across 26 data nodes (42 GB RAM, 2.4 TB SAS, 8‑core CPUs). Key lessons include:

Enable

doc_values

to pre‑materialize fielddata on disk and avoid memory spikes.

Adjust recovery and relocation settings; default conservative parameters can make a node restart take days.

Disable multicast discovery in public clouds to prevent false‑positive scans.

Control shard allocation per node for newly created daily indices to prevent I/O overload on a single node.

Be aware that Elasticsearch is schema‑less, not “no‑schema”; mismatched field types across indices can corrupt searches, and the default

ignore_above:256

may drop long stack‑trace fields.

Recommended Reading

"Elasticsearch Service Development (2nd Edition)"

"Zabbix Monitoring System Deep Dive"

"Log Management and Analysis Authority Guide"

"The Charm of Data: Open‑Source Data Analysis"

"Website Operations: Secrets to Real‑Time Data"

"The Art of Web Capacity Planning"

"Large‑Scale Web Service Development Techniques"

Code as Craft

PerfPlanet Calendar

Kibana Logstash Site

Conclusion

Elasticsearch’s scoring can be used to compare time‑series anomalies, and the newer Watcher plugin adds alerting capabilities. Some users even replace storage systems like GlusterFS with Elasticsearch for image storage and automatic thumbnail generation.

If a newbie has a bad time, it’s a bug. – Jordan Sissel, Logstash author
MonitoringoperationsElasticsearchELKLog AnalysisLogstashKibana
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.