Cloud Native 19 min read

Master Loki: Scalable Log Aggregation for Kubernetes and Prometheus

This guide introduces Loki, the open‑source, horizontally scalable log aggregation system optimized for Prometheus and Kubernetes, covering its core concepts, architecture, components, deployment steps, Grafana integration, label‑based indexing, and best practices for handling dynamic and high‑cardinality tags.

Programmer DD

May 16, 2022

Master Loki: Scalable Log Aggregation for Kubernetes and Prometheus

Preface

When designing the company's container‑cloud log solution, we found mainstream ELK/EFK stacks too heavy and many Elasticsearch search features unnecessary, so we chose Grafana's open‑source Loki log system.

Below we introduce basic concepts and architecture of Loki; of course, EFK remains a mature solution worth knowing.

Overview

Loki is the latest open‑source project from Grafana Labs, a horizontally scalable, highly available, multi‑tenant log aggregation system.

It is economical and easy to operate because it does not index log contents; instead it indexes each log stream with a set of labels, optimized for Prometheus and Kubernetes users.

The project is inspired by Prometheus: “Like Prometheus, but for logs.”

Project address: https://github.com/grafana/loki/ Compared with other log aggregation systems, Loki has the following features:

Does not perform full‑text indexing; stores compressed unstructured logs and only indexes metadata, making operation simpler and cheaper.

Uses the same label‑based indexing and grouping as Prometheus, improving scalability and allowing integration with Alertmanager.

Especially suitable for storing Kubernetes pod logs; pod labels are automatically indexed.

Native support in Grafana, avoiding switching between Kibana and Grafana.

Architecture

Components

Explanation:

Promtail acts as the collector, similar to Filebeat.

Loki serves as the backend, similar to Elasticsearch.

Loki processes consist of four roles:

Querier

Ingester

Query‑frontend

Distributor

The role can be selected via the -target parameter of the Loki binary.

Read path

Querier receives HTTP/1 data requests.

Querier forwards the query to all ingesters to read in‑memory data.

Ingester returns matching data (if any).

If no ingester returns data, the querier lazily loads from the back‑storage and queries it.

Querier deduplicates and returns the final dataset over the HTTP/1 connection.

Write path

Write flow:

Distributor receives an HTTP/1 request to store stream data.

Each stream is hashed using a ring hash.

Distributor sends each stream to the appropriate ingester and its replicas (based on the configured replication factor).

Each instance creates a block for the stream data or appends to an existing block; blocks are unique per tenant and label set.

Distributor responds with a success code over HTTP/1.

Deployment

Local mode installation

Download Promtail and Loki:

wget https://github.com/grafana/loki/releases/download/v2.2.1/loki-linux-amd64.zip
wget https://github.com/grafana/loki/releases/download/v2.2.1/promtail-linux-amd64.zip

Install Promtail

# create directories
mkdir -p /opt/app/{promtail,loki}
# promtail configuration file
cat <<EOF > /opt/app/promtail/promtail.yaml
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /var/log/positions.yaml  # writable by promtail

client:
  url: http://localhost:3100/loki/api/v1/push

scrape_configs:
- job_name: system
  pipeline_stages:
  static_configs:
  - targets:
    - localhost
    labels:
      job: varlogs
      host: yourhost
      __path__: /var/log/*.log
EOF

# unzip and install
unzip promtail-linux-amd64.zip
mv promtail-linux-amd64 /opt/app/promtail/promtail

# systemd service
cat <<EOF > /etc/systemd/system/promtail.service
[Unit]
Description=promtail server
Wants=network-online.target
After=network-online.target

[Service]
ExecStart=/opt/app/promtail/promtail -config.file=/opt/app/promtail/promtail.yaml
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=promtail

[Install]
WantedBy=default.target
EOF

systemctl daemon-reload
systemctl restart promtail
systemctl status promtail

Install Loki

# create directories
mkdir -p /opt/app/{promtail,loki}
# Loki configuration file
cat <<EOF > /opt/app/loki/loki.yaml
auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

ingester:
  wal:
    enabled: true
    dir: /opt/app/loki/wal
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 1h
  max_chunk_age: 1h
  chunk_target_size: 1048576
  chunk_retain_period: 30s
  max_transfer_retries: 0

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: /opt/app/loki/boltdb-shipper-active
    cache_location: /opt/app/loki/boltdb-shipper-cache
    cache_ttl: 24h
    shared_store: filesystem
  filesystem:
    directory: /opt/app/loki/chunks

compactor:
  working_directory: /opt/app/loki/boltdb-shipper-compactor
  shared_store: filesystem

limits_config:
  reject_old_samples: true
  reject_old_samples_max_age: 168h

chunk_store_config:
  max_look_back_period: 0s

table_manager:
  retention_deletes_enabled: false
  retention_period: 0s

ruler:
  storage:
    type: local
    local:
      directory: /opt/app/loki/rules
  rule_path: /opt/app/loki/rules-temp
  alertmanager_url: http://localhost:9093
  ring:
    kvstore:
      store: inmemory
  enable_api: true
EOF

# unzip and install
unzip loki-linux-amd64.zip
mv loki-linux-amd64 /opt/app/loki/loki

# systemd service
cat <<EOF > /etc/systemd/system/loki.service
[Unit]
Description=loki server
Wants=network-online.target
After=network-online.target

[Service]
ExecStart=/opt/app/loki/loki -config.file=/opt/app/loki/loki.yaml
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=loki

[Install]
WantedBy=default.target
EOF

systemctl daemon-reload
systemctl restart l
oki
systemctl status loki

Usage

Configure Loki datasource in Grafana

In Grafana, add a new datasource of type Loki and set the URL to http://loki:3100, then save.

Open the Explore section to query logs, e.g.:

rate({job="message"} |= "kubelet" [1m])

Only label indexing

Loki indexes only labels, not log contents. Example static label matching with Promtail configuration:

scrape_configs:
- job_name: system
  static_configs:
  - targets:
    - localhost
    labels:
      job: message
      __path__: /var/log/messages

Query with label selector: {job="syslog"}. Multiple jobs can be matched with regex, e.g., job=~"apache|syslog".

Dynamic tags and high cardinality

Dynamic tags have non‑fixed values; high‑cardinality tags have many possible values, which can create a large number of streams and affect Loki performance.

Example of extracting action and status_code from Apache access logs using a regex stage in Promtail:

regex:
  expression: "^(?P<ip>\\S+) (?P<identd>\\S+) (?P<user>\\S+) \[(?P<timestamp>[\\w:/]+\\s[+\\-]\\d{4})\] \"(?P<action>\\S+)\\s?(?P<path>\\S+)?\\s?(?P<protocol>\\S+)?\" (?P<status_code>\\d{3}|-) (?P<size>\\d+|- )\\s?\"?(?P<referer>[^\"]*)\"?\\s?\"?(?P<useragent>[^\"]*)?\"?$"

Each combination of action and status_code creates a separate stream.

High‑cardinality issue

Using a label such as ip can generate thousands of streams, which may overwhelm Loki.

Full‑text indexing problem

Full‑text indexes can be as large as the log data itself, requiring memory and making scaling difficult. Loki’s index is typically an order of magnitude smaller than the ingested logs.

Query acceleration without label fields

Example filter expression: {job="apache"} |= "11.11.11.11".

Sharding during query

Loki splits queries into smaller shards, opens each matching block, and searches for the IP.

Shard size and parallelism are configurable.

Deploy many query‑frontends to handle large volumes quickly.

Index mode comparison

Elasticsearch maintains a large index constantly.

Loki launches parallel shards at query time, reducing constant overhead.

Best practices

When log volume is low, add fewer labels to avoid extra chunk overhead.

Add labels only when needed; e.g., if chunk_target_size=1MB, consider adding a label when a chunk reaches 10 MB within max_chunk_age.

Logs should be ingested in time‑ascending order; Loki rejects old data for performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Observability kubernetes prometheus grafana loki log-aggregation

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.