Operations 12 min read

Design and Implementation of a Distributed Real-Time Log Collection and Analysis System Using the ELK/EFK Stack

This article describes the background, requirements, architecture choices, performance testing, and lessons learned from building a large‑scale, distributed log collection and analysis platform at Hujiang using Elasticsearch, Logstash, Kibana, Filebeat, and Kafka to handle billions of log entries daily.

Hujiang Technology

Oct 17, 2017

Design and Implementation of a Distributed Real-Time Log Collection and Analysis System Using the ELK/EFK Stack

Hujiang, the largest online education platform in China, processes about 1 TB of logs per day (≈10⁹ entries) from multiple products, requiring a centralized system for efficient fault diagnosis, service monitoring, and data analysis.

The solution adopts the widely used ELK stack (Elasticsearch, Logstash, Kibana) and extends it to an EFK stack by adding Filebeat as a lightweight shipper. The stack versions are:

Elasticsearch 5.2.2
Logstash 5.2.2
Kibana 5.2.2
Filebeat 5.2.2
Kafka 2.10

Logstash acts as the data collection and processing engine, supporting inputs, filters, and outputs. Kibana provides visualization, while Elasticsearch offers distributed search and analytics. Filebeat replaces Logstash‑forwarder, running without a Java runtime.

Simple Architecture

Logstash instances connect directly to Elasticsearch. Logstash reads logs via Input plugins (e.g., file, TCP), filters them (Grok, mutate, etc.), and writes to Elasticsearch via Output plugins.

Example Grok filter:

grok {
      match => ["message", "(?m)\[%{LOGLEVEL:level}\] \[%{TIMESTAMP_ISO8601:timestamp}\] \[%{DATA:logger}\] \[%{DATA:threadId}\] \[%{DATA:requestId}\] %{GREEDYDATA:msgRawData}"]
    }

Cluster Architecture

Multiple Elasticsearch nodes form a cluster; Logstash runs in cluster mode, and Logstash Shipper Agents are deployed on each server to forward logs.

Drawbacks include high resource consumption on Logstash agents and potential data loss under high concurrency.

Introducing a Message Queue

To buffer spikes, logs are sent from Logstash Shipper Agents to a Kafka cluster before reaching Elasticsearch, eliminating data loss. Kafka is preferred over Redis for its durability and higher storage capacity.

Multi‑Datacenter Deployment

Each datacenter runs its own independent Logstash, Elasticsearch, Kafka, and Kibana clusters, forming a closed loop that avoids cross‑datacenter traffic and latency.

Introducing Filebeat

Filebeat, written in Go, consumes far less CPU and memory than Logstash. Example Filebeat configuration:

# filebeat.yml
filebeat.prospectors:
- input_type: log
  paths: /var/log/nginx/access.log
  json.message_key:

output.elasticsearch:
  hosts: ["localhost"]
  index: "filebeat-nginx-%{+yyyy.MM.dd}"

Performance tests show Filebeat uses ~38% CPU vs. Logstash's ~54% and processes logs ~7× faster.

Lessons Learned

Indexer processes may crash; use a supervisor to keep them running.

Java exception stack traces span multiple lines; resolve with Logstash codec/multiline plugin:

input {
    stdin {
        codec => multiline {
            pattern => "^\["
            negate => true
            what => "previous"
        }
    }
}

Time‑zone mismatches cause an 8‑hour offset; Kibana adjusts timestamps to the browser's time zone.

Grok parse failures often stem from inconsistent log formats; ensure uniform logging and use online Grok debuggers.

Summary

The ELK/EFK‑based log solution offers high scalability (TB‑level daily data), ease of use through Kibana's visual interface, near‑real‑time query response, and an attractive dashboard, making it suitable for large‑scale log management in modern backend operations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

kafka ELK log management distributed logging filebeat

Written by

Hujiang Technology

We focus on the real-world challenges developers face, delivering authentic, practical content and a direct platform for technical networking among developers.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.