Master ELK Log Processing: Encoding, Multiline, Grok, and Performance Tuning
This article compiles practical ELK knowledge, covering character‑set conversion, removing unwanted log lines, Grok pattern handling for multi‑line logs, multiline plugin usage in Filebeat and Logstash, date filtering, log type classification, performance optimization, Redis buffering, and Elasticsearch node tuning.
This article compiles practical ELK knowledge, covering character‑set conversion, removing unwanted log lines, Grok pattern handling for multi‑line logs, multiline plugin usage in Filebeat and Logstash, date filtering, log type classification, performance optimization, Redis buffering, and Elasticsearch node tuning.
1. ELK Practical Knowledge
1.1 Encoding Conversion
Problem: Chinese garbled characters (GB2312 to UTF‑8). Example codec configuration:
codec => plain {
charset => "GB2312"
}Filebeat can also perform the conversion:
filebeat.prospectors:
- input_type: log
paths:
- C:/Users/Administrator/Desktop/performanceTrace.txt
encoding: GB23121.2 Deleting Unnecessary Log Lines
Use a Logstash drop filter to remove lines that match a pattern:
if [message] =~ ~ "^20.*- task request,.*,start time.*" {
drop {}
}1.3 Grok Handling for Multiple Log Lines
Example log entry and corresponding Grok patterns for request and response sections:
match => {"message" => "^20.*- task request,.*,start time:%{TIMESTAMP_ISO8601:RequestTime}"}
match => {"message" => "^-- Request String : {\"UserName\":%{NUMBER:UserName:int},...}"}
match => {"message" => "^-- Response String : {\"ErrorCode\":%{NUMBER:ErrorCode:int},...}"}1.4 Multiline Log Merging (Key Point)
Filebeat multiline configuration (recommended):
filebeat.prospectors:
- input_type: log
paths:
- /root/performanceTrace*
multiline.pattern: '.*"WaitInterval":.*-- End'
multiline.negate: true
multiline.match: beforeOlder Filebeat version (using after ):
filebeat.prospectors:
- input_type: log
paths:
- /root/performanceTrace*
multiline.pattern: '^20.*'
multiline.negate: true
multiline.match: afterLogstash input multiline (when Filebeat is not used):
input {
file {
path => ["/root/logs/log2"]
start_position => "beginning"
codec => multiline {
pattern => "^20.*"
negate => true
what => "previous"
}
}
}Logstash filter multiline (not recommended because it forces pipeline workers to 1):
filter {
multiline {
pattern => "^20.*"
negate => true
what => "previous"
}
}1.5 Date Filter Usage
Convert log timestamps to @timestamp:
date {
match => ["InsertTime", "YYYY-MM-dd HH:mm:ss "]
remove_field => "InsertTime"
}2. Multi‑Type Log Classification
Define type fields in Filebeat to separate logs:
filebeat.prospectors:
- paths: [/mnt/data_total/WebApiDebugLog.txt*]
fields:
type: WebApiDebugLog_total
- paths: [/mnt/data_request/WebApiDebugLog.txt*]
fields:
type: WebApiDebugLog_request
- paths: [/mnt/data_report/WebApiDebugLog.txt*]
fields:
type: WebApiDebugLog_reportUse Logstash if statements to apply different filters or outputs based on [fields][type]:
filter {
if [fields][type] == "WebApiDebugLog_request" {
# request‑specific processing
if [message] =~ "^20.*- task report,.*,start time.*" {
drop {}
}
grok { match => {"message" => "..."} }
}
} output {
if [fields][type] == "WebApiDebugLog_total" {
elasticsearch {
hosts => ["6.6.6.6:9200"]
index => "logstashl-WebApiDebugLog_total-%{+YYYY.MM.dd}"
document_type => "WebApiDebugLog_total_logs"
}
}
}3. Overall ELK Performance Optimization
Key observations on a 1 CPU / 4 GB RAM server:
Logstash processes ~500 logs/s; removing Ruby scripts raises it to ~660 logs/s; removing Grok can reach ~1000 logs/s.
Filebeat can ingest 2500‑3500 logs/s, handling ~64 GB per day per node.
Logstash becomes the bottleneck when pulling from Redis; one instance handles ~6000 logs/s, two instances ~10000 logs/s (CPU saturated).
Recommendations:
Increase pipeline.workers to match CPU cores.
Adjust pipeline.output.workers and pipeline.batch.size (e.g., 1000) for higher throughput.
Set appropriate pipeline.batch.delay (e.g., 10).
4. Introducing Redis as a Buffer
Use Redis list or pub/sub to decouple Filebeat from Logstash, preventing data loss on Logstash failure. Recommended Redis settings for a pure queue:
bind 0.0.0.0
requirepass ilinux.io
save ""
appendonly no
maxmemory 05. Elasticsearch Node Tuning
System parameters (e.g., /etc/sysctl.conf):
vm.swappiness = 1
net.core.somaxconn = 65535
vm.max_map_count = 262144
fs.file-max = 518144Limits ( /etc/security/limits.conf) for the elasticsearch user:
elasticsearch soft nofile 65535
elasticsearch hard nofile 65535
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimitedJVM heap should be set equally for -Xms and -Xmx, not exceeding 50 % of physical RAM and staying below 32 GB.
Elasticsearch elasticsearch.yml optimizations (memory lock, TCP compression, cache sizes, thread‑pool settings) improve stability and query performance.
6. Monitoring and Health Checks
Regularly check CPU, memory, disk I/O, network I/O, and JVM heap usage. Use tools such as top, iostat, dstat, and iftop. Ensure Logstash workers are sized appropriately to avoid high CPU caused by garbage collection.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
