Master ELK Log Processing: Encoding, Multiline, Grok, and Performance Tuning
This article compiles practical ELK knowledge, covering character‑set conversion, removing unwanted log lines, Grok pattern handling for multi‑line logs, multiline plugin usage in Filebeat and Logstash, date filtering, log type classification, performance optimization, Redis buffering, and Elasticsearch node tuning.
This article compiles practical ELK knowledge, covering character‑set conversion, removing unwanted log lines, Grok pattern handling for multi‑line logs, multiline plugin usage in Filebeat and Logstash, date filtering, log type classification, performance optimization, Redis buffering, and Elasticsearch node tuning.
1. ELK Practical Knowledge
1.1 Encoding Conversion
Problem: Chinese garbled characters (GB2312 to UTF‑8). Example codec configuration:
<code>codec => plain {
charset => "GB2312"
}</code>Filebeat can also perform the conversion:
<code>filebeat.prospectors:
- input_type: log
paths:
- C:/Users/Administrator/Desktop/performanceTrace.txt
encoding: GB2312</code>1.2 Deleting Unnecessary Log Lines
Use a Logstash drop filter to remove lines that match a pattern:
<code>if [message] =~ ~ "^20.*- task request,.*,start time.*" {
drop {}
}</code>1.3 Grok Handling for Multiple Log Lines
Example log entry and corresponding Grok patterns for request and response sections:
<code>match => {"message" => "^20.*- task request,.*,start time:%{TIMESTAMP_ISO8601:RequestTime}"}
match => {"message" => "^-- Request String : {\"UserName\":%{NUMBER:UserName:int},...}"}
match => {"message" => "^-- Response String : {\"ErrorCode\":%{NUMBER:ErrorCode:int},...}"}</code>1.4 Multiline Log Merging (Key Point)
Filebeat multiline configuration (recommended):
<code>filebeat.prospectors:
- input_type: log
paths:
- /root/performanceTrace*
multiline.pattern: '.*"WaitInterval":.*-- End'
multiline.negate: true
multiline.match: before</code>Older Filebeat version (using after ):
<code>filebeat.prospectors:
- input_type: log
paths:
- /root/performanceTrace*
multiline.pattern: '^20.*'
multiline.negate: true
multiline.match: after</code>Logstash input multiline (when Filebeat is not used):
<code>input {
file {
path => ["/root/logs/log2"]
start_position => "beginning"
codec => multiline {
pattern => "^20.*"
negate => true
what => "previous"
}
}
}</code>Logstash filter multiline (not recommended because it forces pipeline workers to 1):
<code>filter {
multiline {
pattern => "^20.*"
negate => true
what => "previous"
}
}</code>1.5 Date Filter Usage
Convert log timestamps to @timestamp:
<code>date {
match => ["InsertTime", "YYYY-MM-dd HH:mm:ss "]
remove_field => "InsertTime"
}</code>2. Multi‑Type Log Classification
Define type fields in Filebeat to separate logs:
<code>filebeat.prospectors:
- paths: [/mnt/data_total/WebApiDebugLog.txt*]
fields:
type: WebApiDebugLog_total
- paths: [/mnt/data_request/WebApiDebugLog.txt*]
fields:
type: WebApiDebugLog_request
- paths: [/mnt/data_report/WebApiDebugLog.txt*]
fields:
type: WebApiDebugLog_report</code>Use Logstash if statements to apply different filters or outputs based on
[fields][type]:
<code>filter {
if [fields][type] == "WebApiDebugLog_request" {
# request‑specific processing
if [message] =~ "^20.*- task report,.*,start time.*" {
drop {}
}
grok { match => {"message" => "..."} }
}
}</code> <code>output {
if [fields][type] == "WebApiDebugLog_total" {
elasticsearch {
hosts => ["6.6.6.6:9200"]
index => "logstashl-WebApiDebugLog_total-%{+YYYY.MM.dd}"
document_type => "WebApiDebugLog_total_logs"
}
}
}</code>3. Overall ELK Performance Optimization
Key observations on a 1 CPU / 4 GB RAM server:
Logstash processes ~500 logs/s; removing Ruby scripts raises it to ~660 logs/s; removing Grok can reach ~1000 logs/s.
Filebeat can ingest 2500‑3500 logs/s, handling ~64 GB per day per node.
Logstash becomes the bottleneck when pulling from Redis; one instance handles ~6000 logs/s, two instances ~10000 logs/s (CPU saturated).
Recommendations:
Increase
pipeline.workersto match CPU cores.
Adjust
pipeline.output.workersand
pipeline.batch.size(e.g., 1000) for higher throughput.
Set appropriate
pipeline.batch.delay(e.g., 10).
4. Introducing Redis as a Buffer
Use Redis list or pub/sub to decouple Filebeat from Logstash, preventing data loss on Logstash failure. Recommended Redis settings for a pure queue:
<code>bind 0.0.0.0
requirepass ilinux.io
save ""
appendonly no
maxmemory 0</code>5. Elasticsearch Node Tuning
System parameters (e.g.,
/etc/sysctl.conf):
<code>vm.swappiness = 1
net.core.somaxconn = 65535
vm.max_map_count = 262144
fs.file-max = 518144</code>Limits (
/etc/security/limits.conf) for the
elasticsearchuser:
<code>elasticsearch soft nofile 65535
elasticsearch hard nofile 65535
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited</code>JVM heap should be set equally for
-Xmsand
-Xmx, not exceeding 50 % of physical RAM and staying below 32 GB.
Elasticsearch
elasticsearch.ymloptimizations (memory lock, TCP compression, cache sizes, thread‑pool settings) improve stability and query performance.
6. Monitoring and Health Checks
Regularly check CPU, memory, disk I/O, network I/O, and JVM heap usage. Use tools such as
top,
iostat,
dstat, and
iftop. Ensure Logstash workers are sized appropriately to avoid high CPU caused by garbage collection.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.