Operations 16 min read

Master ELK Log Processing: Encoding, Multiline, Grok, and Performance Tuning

This article compiles practical ELK knowledge, covering character‑set conversion, removing unwanted log lines, Grok pattern handling for multi‑line logs, multiline plugin usage in Filebeat and Logstash, date filtering, log type classification, performance optimization, Redis buffering, and Elasticsearch node tuning.

Efficient Ops

May 9, 2019

Master ELK Log Processing: Encoding, Multiline, Grok, and Performance Tuning

1. ELK Practical Knowledge

1.1 Encoding Conversion

Problem: Chinese garbled characters (GB2312 to UTF‑8). Example codec configuration:

codec => plain {
  charset => "GB2312"
}

Filebeat can also perform the conversion:

filebeat.prospectors:
- input_type: log
  paths:
  - C:/Users/Administrator/Desktop/performanceTrace.txt
  encoding: GB2312

1.2 Deleting Unnecessary Log Lines

Use a Logstash drop filter to remove lines that match a pattern:

if [message] =~ ~ "^20.*- task request,.*,start time.*" {
  drop {}
}

1.3 Grok Handling for Multiple Log Lines

Example log entry and corresponding Grok patterns for request and response sections:

match => {"message" => "^20.*- task request,.*,start time:%{TIMESTAMP_ISO8601:RequestTime}"}
match => {"message" => "^-- Request String : {\"UserName\":%{NUMBER:UserName:int},...}"}
match => {"message" => "^-- Response String : {\"ErrorCode\":%{NUMBER:ErrorCode:int},...}"}

1.4 Multiline Log Merging (Key Point)

Filebeat multiline configuration (recommended):

filebeat.prospectors:
- input_type: log
  paths:
  - /root/performanceTrace*
  multiline.pattern: '.*"WaitInterval":.*-- End'
  multiline.negate: true
  multiline.match: before

Older Filebeat version (using after ):

filebeat.prospectors:
- input_type: log
  paths:
  - /root/performanceTrace*
  multiline.pattern: '^20.*'
  multiline.negate: true
  multiline.match: after

Logstash input multiline (when Filebeat is not used):

input {
  file {
    path => ["/root/logs/log2"]
    start_position => "beginning"
    codec => multiline {
      pattern => "^20.*"
      negate => true
      what => "previous"
    }
  }
}

Logstash filter multiline (not recommended because it forces pipeline workers to 1):

filter {
  multiline {
    pattern => "^20.*"
    negate => true
    what => "previous"
  }
}

1.5 Date Filter Usage

Convert log timestamps to @timestamp:

date {
  match => ["InsertTime", "YYYY-MM-dd HH:mm:ss "]
  remove_field => "InsertTime"
}

2. Multi‑Type Log Classification

Define type fields in Filebeat to separate logs:

filebeat.prospectors:
- paths: [/mnt/data_total/WebApiDebugLog.txt*]
  fields:
    type: WebApiDebugLog_total
- paths: [/mnt/data_request/WebApiDebugLog.txt*]
  fields:
    type: WebApiDebugLog_request
- paths: [/mnt/data_report/WebApiDebugLog.txt*]
  fields:
    type: WebApiDebugLog_report

Use Logstash if statements to apply different filters or outputs based on [fields][type]:

filter {
  if [fields][type] == "WebApiDebugLog_request" {
    # request‑specific processing
    if [message] =~ "^20.*- task report,.*,start time.*" {
      drop {}
    }
    grok { match => {"message" => "..."} }
  }
}

output {
  if [fields][type] == "WebApiDebugLog_total" {
    elasticsearch {
      hosts => ["6.6.6.6:9200"]
      index => "logstashl-WebApiDebugLog_total-%{+YYYY.MM.dd}"
      document_type => "WebApiDebugLog_total_logs"
    }
  }
}

3. Overall ELK Performance Optimization

Key observations on a 1 CPU / 4 GB RAM server:

Logstash processes ~500 logs/s; removing Ruby scripts raises it to ~660 logs/s; removing Grok can reach ~1000 logs/s.

Filebeat can ingest 2500‑3500 logs/s, handling ~64 GB per day per node.

Logstash becomes the bottleneck when pulling from Redis; one instance handles ~6000 logs/s, two instances ~10000 logs/s (CPU saturated).

Recommendations:

Increase pipeline.workers to match CPU cores.

Adjust pipeline.output.workers and pipeline.batch.size (e.g., 1000) for higher throughput.

Set appropriate pipeline.batch.delay (e.g., 10).

4. Introducing Redis as a Buffer

Use Redis list or pub/sub to decouple Filebeat from Logstash, preventing data loss on Logstash failure. Recommended Redis settings for a pure queue:

bind 0.0.0.0
requirepass ilinux.io
save ""
appendonly no
maxmemory 0

5. Elasticsearch Node Tuning

System parameters (e.g., /etc/sysctl.conf):

vm.swappiness = 1
net.core.somaxconn = 65535
vm.max_map_count = 262144
fs.file-max = 518144

Limits ( /etc/security/limits.conf) for the elasticsearch user:

elasticsearch soft nofile 65535
elasticsearch hard nofile 65535
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited

JVM heap should be set equally for -Xms and -Xmx, not exceeding 50 % of physical RAM and staying below 32 GB.

Elasticsearch elasticsearch.yml optimizations (memory lock, TCP compression, cache sizes, thread‑pool settings) improve stability and query performance.

6. Monitoring and Health Checks

Regularly check CPU, memory, disk I/O, network I/O, and JVM heap usage. Use tools such as top, iostat, dstat, and iftop. Ensure Logstash workers are sized appropriately to avoid high CPU caused by garbage collection.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch Redis Performance Tuning ELK Logstash filebeat multiline

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.