Introduction, Architecture, Deployment and Usage of Grafana Loki Log Aggregation System
This article introduces Grafana Loki, an open‑source, horizontally scalable, highly available log aggregation system optimized for Kubernetes and Prometheus, covering its core concepts, architecture, component roles, deployment steps, configuration examples, and practical usage within Grafana.
Preface
When designing a container‑cloud logging solution, the heavyweight nature of ELK/EFK and the limited need for complex Elasticsearch queries led to the selection of Grafana's open‑source Loki system.
The article also notes that familiarity with the mature EFK solution remains valuable.
Overview
Loki is a horizontally scalable, highly available, multi‑tenant log aggregation system from Grafana Labs.
It is cost‑effective and easy to operate because it indexes only log stream metadata (labels) rather than full log content, making it especially suited for Prometheus and Kubernetes users.
Inspired by Prometheus, Loki’s tagline is "Like Prometheus, but for logs."
Project address: https://github.com/grafana/loki/
Key features compared with other log systems include:
No full‑text indexing; stores compressed unstructured logs and indexes only metadata, reducing cost.
Uses the same label‑based indexing as Prometheus, enabling efficient grouping and alertmanager integration.
Optimized for Kubernetes pod logs; pod labels are automatically indexed.
Native Grafana support, avoiding the need to switch between Kibana and Grafana.
Architecture
Diagram omitted (refer to original images).
Component Description
Key components:
Promtail – the collector, analogous to Filebeat.
Loki – the server side, analogous to Elasticsearch.
Loki processes run in four roles:
Querier – query engine.
Ingesters – log storage.
Query‑frontend – front‑end query handler.
Distributor – write dispatcher.
The role can be set via the -target flag on the Loki binary.
Read Path
Querier receives HTTP/1 requests.
Querier forwards the query to all ingesters to retrieve in‑memory data.
Ingesters return matching data, if any.
If no ingester returns data, the querier loads data from the back‑end store.
The querier deduplicates and streams the final dataset back over HTTP/1.
Write Path
Diagram omitted.
Distributor receives an HTTP/1 request to store stream data.
Each stream is hashed using a ring hash.
Distributor forwards each stream to the appropriate ingester and its replicas.
Each instance creates or appends a block for the stream; blocks are unique per tenant and label set.
Distributor responds with a success code over HTTP/1.
Deployment
Local Mode Installation
Download Promtail and Loki:
wget https://github.com/grafana/loki/releases/download/v2.2.1/loki-linux-amd64.zip
wget https://github.com/grafana/loki/releases/download/v2.2.1/promtail-linux-amd64.zipInstall Promtail
$ mkdir /opt/app/{promtail,loki} -pv
# promtail configuration file
$ cat <
/etc/systemd/system/promtail.service
[Unit]
Description=promtail server
Wants=network-online.target
After=network-online.target
[Service]
ExecStart=/opt/app/promtail/promtail -config.file=/opt/app/promtail/promtail.yaml
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=promtail
[Install]
WantedBy=default.target
EOF
systemctl daemon-reload
systemctl restart promtail
systemctl status promtailInstall Loki:
$ mkdir /opt/app/{promtail,loki} -pv
# Loki configuration file
$ cat <
/etc/systemd/system/loki.service
[Unit]
Description=loki server
Wants=network-online.target
After=network-online.target
[Service]
ExecStart=/opt/app/loki/loki -config.file=/opt/app/loki/loki.yaml
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=loki
[Install]
WantedBy=default.target
EOF
systemctl daemon-reload
systemctl restart loki
systemctl status lokiUsage
Configure Loki datasource in Grafana
In Grafana, add a new datasource of type Loki and set the URL to http://loki:3100 , then save.
After saving, switch to the Explore section to access Loki logs.
Click “Log labels” to view collected log labels and filter queries accordingly.
Example query to view /var/log/messages logs (adjust time zone if needed).
Query in Grafana Explore
rate({job="message"} |= "kubelet"Equivalent to QPS calculation: rate({job="message"} |= "kubelet" [1m])
Index‑only Labels
Loki indexes only labels, not full log content, which reduces index size dramatically compared to Elasticsearch.
Static Label Matching Example
scrape_configs:
- job_name: system
pipeline_stages:
static_configs:
- targets:
- localhost
labels:
job: message
__path__: /var/log/messagesThis creates a fixed label job="message" for logs under /var/log/messages .
Dynamic Labels and High Cardinality
Dynamic label values (e.g., IP address) can cause high cardinality, leading to many streams and potential performance issues.
Example regex stage to extract action and status_code from Apache access logs:
regex:
expression: "^(?P
\S+) (?P
\S+) (?P
\S+) \[(?P
[\w:/]+\s[+\-]\d{4})\] \"(?P
\S+)\s?(?P
\S+)?\s?(?P
\S+)?\" (?P
\d{3}|-) (?P
\d+|-)\s?\"?(?P
[^\"]*)\"?\s?\"?(?P
[^\"]*)?\"?$"
labels:
action:
status_code:Each unique combination of extracted labels creates a separate stream and chunk.
High‑Cardinality Problem
Using a high‑cardinality label such as IP can generate thousands of streams, which may overwhelm Loki.
Full‑Text Index Issue
Full‑text indexes can be as large as the log data itself, requiring significant memory and making scaling difficult. Loki’s index is typically an order of magnitude smaller.
Query Acceleration without Labels
{job="apache"} |= "11.11.11.11"Shard‑Based Query Execution
Loki splits queries into smaller shards, opens matching chunks per stream, and searches them in parallel.
Shard size and parallelism are configurable based on resources.
Deploying many query‑frontends can process large volumes of logs quickly.
Index Mode Comparison
Elasticsearch maintains a large index constantly, consuming memory.
Loki builds temporary shards during query time, reducing constant overhead.
Best Practices
When log volume is low, add fewer labels to reduce chunk loading.
Add labels only when needed, e.g., when chunk_target_size=1MB and log volume justifies it.
Ensure logs are ingested in time‑order; Loki rejects old data for performance.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.