Operations 13 min read

Unlock Real-Time Log Analysis with ELK: From Basics to Advanced Practices

This article explores how the ELK stack can transform large‑scale log processing into fast, flexible, and interactive analysis for troubleshooting, security auditing, and monitoring, sharing practical examples, common pitfalls, and best‑practice recommendations from real‑world deployments at Sina.

Efficient Ops

Jun 1, 2015

Unlock Real-Time Log Analysis with ELK: From Basics to Advanced Practices

ELK Usage Scenarios

This article introduces the ELK suite for log handling, starting with an overview of its components and common use cases.

Why Logs Matter

Problem diagnosis – data‑driven operations.

Security auditing.

Monitoring.

Monitoring is the aggregation of health and performance data, events, and relationships delivered via an interface that provides a holistic view of a system’s state to better understand and address failure scenarios.

Effective log analysis must go beyond simply storing logs; it should enable rapid, interactive investigation.

Guest Introduction

Rao Chenlin, system architect at Sina Tech Assurance, Perl programmer, author of "Website Operations Technology and Practice", former at YunKuaiXian and Renren, focuses on CDN and automation, recently researching log processing and monitoring.

Application Examples

Typical application log (image omitted for brevity).

Logstash configuration example:

Kibana 3 dashboard screenshot:

Kibana 4 dashboard screenshot (improved performance, color tweaks):

These examples show how a few dozen Logstash lines can power diverse visualizations such as time‑series histograms and top‑N term charts.

Using ELK for PHP slow‑log analysis:

Resulting Kibana dashboard (clickable host filter):

Interactive filtering lets operators pinpoint problematic hosts and trace slow function calls.

Multi‑dimensional analysis of Nginx error logs:

ELK also supports crash‑log analysis, allowing developers to filter out system functions and focus on application‑specific stack traces.

Adding a version filter refines top‑N results for new releases.

Best Practices

At Sina, the ELK deployment handles 65 billion log entries over seven days across 26 data nodes (42 GB RAM, 2.4 TB SAS, 8‑core CPUs). Key lessons include:

Enable doc_values to pre‑materialize fielddata on disk and avoid memory spikes.

Adjust recovery and relocation settings; default conservative parameters can make a node restart take days.

Disable multicast discovery in public clouds to prevent false‑positive scans.

Control shard allocation per node for newly created daily indices to prevent I/O overload on a single node.

Be aware that Elasticsearch is schema‑less, not “no‑schema”; mismatched field types across indices can corrupt searches, and the default ignore_above:256 may drop long stack‑trace fields.

Conclusion

Elasticsearch’s scoring can be used to compare time‑series anomalies, and the newer Watcher plugin adds alerting capabilities. Some users even replace storage systems like GlusterFS with Elasticsearch for image storage and automatic thumbnail generation.

If a newbie has a bad time, it’s a bug. – Jordan Sissel, Logstash author

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch ELK Logstash Kibana

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

ELK Usage Scenarios

Why Logs Matter

Guest Introduction

Application Examples

Best Practices

Recommended Reading

Conclusion

Efficient Ops

How this landed with the community

Was this worth your time?

0 Comments