Top 10 Logstash Interview Questions & Answers
This article walks through the most common Logstash interview topics, covering its role in the ELK stack, key Input and Filter plugins, the difference between DATA and GREEDYDATA, Mutate gsub usage, parsing JSON/XML, CSV/KV handling, Grok patterns, the Date filter, and strategies to prevent duplicate documents.
1. Role and Architecture of Logstash in the ELK Stack
Logstash is the data‑collection, filtering and transformation component of the Elastic Stack. It reads data from sources such as files, databases or cloud storage, applies a series of Filter plugins to enrich or parse the data, and finally sends the processed events to a destination like Elasticsearch.
A typical Logstash pipeline consists of three stages:
Input : determines where data is collected from.
Filter : processes, parses and transforms the data, extracting useful fields.
Output : delivers the processed data to a target such as Elasticsearch, a file or an HTTP endpoint.
Official documentation: https://www.elastic.co/guide/en/logstash/current/index.html
2. Common Input Plugins and Configuration Example
Typical Input plugins include:
File : reads log files from local or mounted file systems.
Beats : receives data from Filebeat, Metricbeat and other Beats.
S3 : periodically scans an Amazon S3 bucket for log files.
JDBC : connects to relational databases (MySQL, PostgreSQL, etc.) and runs SQL queries on a schedule.
http_poller : polls an HTTP API at intervals.
Elastic Agent : a unified agent that collects various logs and metrics.
Example using the File input:
input {
file {
path => "/var/log/application/*.log"
start_position => "beginning"
sincedb_path => "/dev/null"
type => "application_logs"
}
}For database ingestion, use the jdbc plugin; for HTTP data, use the http_poller plugin.
Official documentation: https://www.elastic.co/guide/en/logstash/current/input-plugins.html
3. Common Filter Plugins and Their Usage
Filter plugins are used to "process" data, extracting or transforming fields. Frequently used filters are:
Grok : parses logs with predefined regular‑expression patterns.
Mutate : adds, removes, renames fields or changes their types.
Date : parses date strings and converts them to Logstash timestamps.
CSV : parses CSV‑formatted data.
JSON : parses JSON strings into fields.
XML : parses XML into a JSON‑like structure.
Split : splits array or multiline fields into separate events.
Kv : parses key=value strings into fields.
Translate : maps field values using external dictionaries.
Official documentation: https://www.elastic.co/guide/en/logstash/current/filter-plugins.html
4. Difference Between DATA and GREEDYDATA (Grok)
In Grok patterns, %{DATA} matches any characters up to the first space or delimiter, while %{GREEDYDATA} matches everything to the end of the line, including spaces and special characters. Choose the appropriate pattern based on the log structure you need to capture.
Official documentation: https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html https://discuss.elastic.co/t/what-is-greedydata/122078/2
5. Using gsub in the Mutate Filter and Other Common Operations
The gsub option performs global substitution on a string field, useful for removing line breaks or unwanted characters.
filter {
mutate {
gsub => [
"message", "
", "",
"message", "\r", ""
]
}
}Other common Mutate operations include: rename: renames a field. remove_field: deletes a field. convert: changes a field's data type. update: updates the content of a field.
Official documentation: https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-gsub
6. Parsing Structured Data such as JSON and XML
To parse JSON logs, use the JSON filter and specify the source field. For XML, the XML filter converts the document into a JSON‑like structure for further processing. Nested JSON can be handled by applying the JSON filter multiple times or by using the Ruby or Mutate filter to flatten the hierarchy.
Official documentation: https://www.elastic.co/guide/en/logstash/current/plugins-filters-json.html
7. Parsing CSV and Key‑Value Formats
The CSV filter lets you define a separator and columns to map each column to a field name. The Kv filter parses key=value strings (common in Nginx or system logs) by configuring field_split and value_split.
Official documentation: https://www.elastic.co/guide/en/logstash/current/plugins-filters-csv.html
8. Using Grok to Parse Unstructured Logs
Grok is the primary solution for parsing free‑form text. It provides many predefined patterns for common log formats, allowing you to write concise match rules. Example:
filter {
grok {
match => {"message" => "User: %{USERNAME:user}, Action: %{WORD:action}, Status: %{WORD:status}"}
}
}This extracts the user, action and status fields from the original message field.
Official documentation: https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
9. Purpose of the Date Filter
The Date filter converts a string‑formatted timestamp into a Logstash @timestamp field, enabling accurate time‑based searches and aggregations in Elasticsearch.
filter {
date {
match => ["timestamp", "yyyy-MM-dd HH:mm:ss"]
target => "@timestamp"
}
}Official documentation: https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html
10. Preventing Duplicate Documents
To avoid indexing the same record multiple times, you can:
Set document_id in the Output configuration to a unique field, e.g.:
output {
elasticsearch {
index => "my_index"
document_id => "%{unique_field}"
}
}Use sincedb (for files) or tracking_column (for JDBC) to read only new data.
For incremental updates or upserts, configure action => "update" and doc_as_upsert => true.
Reference: https://stackoverflow.com/questions/42003462/how-to-set-document-id-in-elastic-using-logstash-config-file
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mingyi World Elasticsearch
The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
