Operations 25 min read

How to Build a Lightweight Log Platform with Grafana and Loki in 3 Simple Steps

This guide walks you through replacing a heavyweight ELK stack with a minimal Grafana‑Loki logging solution, covering environment requirements, installation of Loki and Promtail, configuration details, best‑practice tips, troubleshooting, and backup strategies for reliable log aggregation.

Raymond Ops
Raymond Ops
Raymond Ops
How to Build a Lightweight Log Platform with Grafana and Loki in 3 Simple Steps

Overview

The team replaced a problematic ELK stack with Loki because Loki indexes only labels (no full‑text indexing), consumes far fewer resources, integrates natively with Grafana, and runs from a single binary.

Key Technical Features

Lightweight storage : Only label indexes are stored, reducing storage cost by >10× compared to Elasticsearch.

Cloud‑native design : Supports multi‑tenant mode, horizontal scaling, and object storage back‑ends (S3/MinIO/GCS).

Grafana integration : Uses LogQL for log queries and can be combined with Prometheus metrics.

Promtail collector : Small, Filebeat‑like agent with flexible pipeline stages for parsing and labeling.

Typical Use Cases

Small‑to‑medium log platforms (<100 GB/day) where full‑text search is not required.

Teams already using Grafana for monitoring.

Kubernetes clusters – Promtail can scrape pod logs via CRI.

Migrations from ELK where cost and operational complexity are concerns.

Environment Requirements

OS: CentOS 7+ or Ubuntu 18.04+ (Ubuntu 22.04 recommended).

Memory: ≥4 GB (8 GB+ recommended).

Disk: SSD preferred; size depends on retention policy.

Software: Loki 2.9+, Grafana 9.0+, Promtail version matching Loki.

Step‑by‑Step Setup

1. Preparation

System check

# Check OS version
cat /etc/os-release

# Check memory and disk
free -h
df -h

# Verify time sync (important for log timestamps)
timedatectl status
# If out of sync, install chrony
sudo apt install -y chrony
sudo systemctl enable --now chrony

Create directories and system user

# Create working directories
sudo mkdir -p /opt/loki/{data,config}
sudo mkdir -p /opt/promtail/config
sudo mkdir -p /opt/grafana

# Create Loki system user (no login shell)
sudo useradd --system --no-create-home --shell /bin/false loki
sudo chown -R loki:loki /opt/loki

2. Install Loki

Download and install binary

# Define version (check latest at https://github.com/grafana/loki/releases)
LOKI_VERSION="2.9.4"
cd /tmp
wget https://github.com/grafana/loki/releases/download/v${LOKI_VERSION}/loki-linux-amd64.zip
unzip loki-linux-amd64.zip
sudo mv loki-linux-amd64 /usr/local/bin/loki
sudo chmod +x /usr/local/bin/loki
# Verify installation
loki --version

Loki configuration (example)

# /opt/loki/config/loki-config.yaml
auth_enabled: false
server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  log_level: info
common:
  instance_addr: 127.0.0.1
  path_prefix: /opt/loki/data
storage:
  filesystem:
    chunks_directory: /opt/loki/data/chunks
    rules_directory: /opt/loki/data/rules
    retention_policy:
      retention_period: 720h   # 30 days
schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h
limits_config:
  reject_old_samples: true
  reject_old_samples_max_age: 168h   # 7 days
  max_query_series: 5000
  ingestion_rate_mb: 16
  per_stream_rate_limit: 5MB
  per_stream_rate_limit_burst: 15MB

Parameter notes auth_enabled: set to false for single‑node deployments without multi‑tenant auth. schema_config: uses the latest v13 schema with TSDB storage. limits_config: protects the cluster from spikes and defines retention.

3. Install Promtail

Promtail configuration (example)

# /opt/promtail/config/promtail-config.yaml
server:
  http_listen_port: 9080
  grpc_listen_port: 0
positions:
  filename: /opt/promtail/positions.yaml
clients:
  - url: http://localhost:3100/loki/api/v1/push
    tenant_id: default
scrape_configs:
  - job_name: system
    static_configs:
      - targets: [localhost]
        labels:
          job: syslog
          host: ${HOSTNAME}
          __path__: /var/log/syslog
  - job_name: nginx
    static_configs:
      - targets: [localhost]
        labels:
          job: nginx
          host: ${HOSTNAME}
          __path__: /var/log/nginx/access.log
    pipeline_stages:
      - regex:
          expression: '^(?P<remote_addr>\S+) - (?P<remote_user>\S+) \[(?P<time_local>[^\]]+)\] "(?P<method>\S+) (?P<request_uri>\S+) (?P<protocol>\S+)" (?P<status>\d+) (?P<body_bytes_sent>\d+) "(?P<http_referer>[^"]*)" "(?P<http_user_agent>[^"]*)"'
      - labels:
          method:
          status:
  - job_name: app
    static_configs:
      - targets: [localhost]
        labels:
          job: app
          host: ${HOSTNAME}
          __path__: /var/log/app/*.log
    pipeline_stages:
      - json:
          expressions:
            level: level
            service: service
            trace_id: trace_id
      - labels:
          level:
          service:

Pipeline stages let you extract fields (via regex , json ) and turn them into labels. Keep the number of labels low to avoid high cardinality.

4. Systemd Service Files

# /etc/systemd/system/loki.service
[Unit]
Description=Loki Log Aggregation System
After=network.target

[Service]
Type=simple
User=loki
Group=loki
ExecStart=/usr/local/bin/loki -config.file=/opt/loki/config/loki-config.yaml
Restart=on-failure
RestartSec=10
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/promtail.service
[Unit]
Description=Promtail Log Collector
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/promtail -config.file=/opt/promtail/config/promtail-config.yaml
Restart=on-failure
RestartSec=10
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

5. Start and Verify

# Download Promtail binary (same version as Loki)
cd /tmp
wget https://github.com/grafana/loki/releases/download/v${LOKI_VERSION}/promtail-linux-amd64.zip
unzip promtail-linux-amd64.zip
sudo mv promtail-linux-amd64 /usr/local/bin/promtail
sudo chmod +x /usr/local/bin/promtail

# Reload systemd and enable services
sudo systemctl daemon-reload
sudo systemctl enable --now loki
sudo systemctl enable --now promtail

# Verify Loki readiness
curl -s http://localhost:3100/ready   # expected output: ready

# Check a few metrics
curl -s http://localhost:3100/metrics | head -n 20

# Verify Promtail targets
curl -s http://localhost:9080/targets | jq .

# Test a LogQL query via API
curl -G -s "http://localhost:3100/loki/api/v1/query_range" \
  --data-urlencode 'query={job="syslog"}' \
  --data-urlencode 'limit=5' | jq .

Object Storage Example (MinIO)

# /opt/loki/config/loki-config-s3.yaml
auth_enabled: false
server:
  http_listen_port: 3100
  grpc_listen_port: 9096
common:
  instance_addr: 127.0.0.1
  path_prefix: /opt/loki/data
storage_config:
  tsdb_shipper:
    active_index_directory: /opt/loki/data/tsdb-index
    cache_location: /opt/loki/data/tsdb-cache
    aws:
      s3:
        s3: s3://minioadmin:minioadmin@localhost:9000/loki
        s3forcepathstyle: true
schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: s3
      schema: v13
      index:
        prefix: loki_index_
        period: 24h
limits_config:
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  max_query_series: 10000
  ingestion_rate_mb: 32
  ingestion_burst_size_mb: 64
compactor:
  working_directory: /opt/loki/data/compactor
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
  retention_delete_worker_count: 150

Grafana Data Source Configuration

# Quick start Grafana via Docker
docker run -d \
  --name grafana \
  -p 3000:3000 \
  -v grafana-storage:/var/lib/grafana \
  grafana/grafana:latest

In Grafana UI, add a Loki data source with URL http://localhost:3100 and click Save & Test .

Kubernetes Deployment (Helm)

# loki-stack-values.yaml (excerpt)
loki:
  enabled: true
  persistence:
    enabled: true
    size: 50Gi
  config:
    limits_config:
      ingestion_rate_mb: 32
      per_stream_rate_limit: 5MB
promtail:
  enabled: true
  config:
    snippets:
      pipelineStages:
        - cri: {}
        - json:
            expressions:
              level: level
              msg: msg
        - labels:
            level:
            service:

# Deploy with Helm
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
kubectl create namespace loki
helm install loki-stack grafana/loki-stack -n loki -f loki-stack-values.yaml

Multi‑Tenant Log Isolation

# Enable multi‑tenant mode in Loki
auth_enabled: true

# Example Promtail client for dev environment
clients:
  - url: http://loki:3100/loki/api/v1/push
    tenant_id: dev

# Example Promtail client for test environment
clients:
  - url: http://loki:3100/loki/api/v1/push
    tenant_id: test

# Example Promtail client for prod environment
clients:
  - url: http://loki:3100/loki/api/v1/push
    tenant_id: prod

In Grafana, create separate Loki data sources for each tenant and add the HTTP header X-Scope-OrgID: <tenant>.

Best Practices

Label Design

Use low‑cardinality labels only (e.g., level, service, env). Avoid high‑cardinality fields such as user_id or request IDs.

# Bad example (high cardinality)
labels:
  user_id: "12345"

# Good example (low cardinality)
labels:
  level: "error"
  service: "auth"
  env: "prod"

Chunk Configuration

ingester:
  chunk_idle_period: 1h
  chunk_block_size: 262144
  chunk_target_size: 1536000
  max_chunk_age: 2h

Recording Rules (pre‑compute frequent queries)

# /opt/loki/rules/rules.yaml
groups:
  - name: error_rate
    rules:
      - record: job:log_errors:rate5m
        expr: sum(rate({level="error"}[5m])) by (job)

Security Hardening

Restrict Loki port 3100 with firewall or iptables.

Enable auth_enabled: true for production multi‑tenant deployments or place an Nginx reverse proxy with Basic Auth.

Mask sensitive data in Promtail pipelines using replace stages.

pipeline_stages:
  - replace:
      expression: '(\d{3})\d{4}(\d{4})'
      replace: '${1}****${2}'   # phone number masking
  - replace:
      expression: 'password[=:]\s*\S+'
      replace: 'password=***'   # password masking

High Availability

Deploy Loki in micro‑service mode (separate Ingester, Distributor, Querier, Compactor).

Use object storage (S3/MinIO/GCS) for durable storage.

Run at least three Ingester replicas and set replication_factor: 3.

Troubleshooting & Monitoring

Common Issues

Entry out of order : timestamps in the same stream must be monotonic. Fix the source or adjust Promtail timestamp parsing.

Rate limit exceeded : increase ingestion_rate_mb or per‑stream limits in limits_config.

Too many outstanding requests : raise max_outstanding_per_tenant or add more query nodes.

Key Metrics (Prometheus scrape)

# Write throughput
curl -s http://localhost:3100/metrics | grep loki_distributor_bytes_received_total

# Active streams (memory usage)
curl -s http://localhost:3100/metrics | grep loki_ingester_memory_streams

# Query latency (p99)
curl -s http://localhost:3100/metrics | grep loki_request_duration_seconds_bucket

# Chunk flush count
curl -s http://localhost:3100/metrics | grep loki_ingester_chunks_flushed_total

Alert Rules (Prometheus)

groups:
  - name: loki
    rules:
      - alert: LokiHighMemoryStreams
        expr: loki_ingester_memory_streams > 100000
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "Loki stream count too high"
          description: "Active streams: {{ $value }}, check label design."

      - alert: LokiQueryLatencyHigh
        expr: histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{route=~"loki_api_v1_query.*"}[5m])) by (le)) > 30
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Loki query latency high"
          description: "p99 latency exceeds 30 s."

Backup & Restore (local filesystem)

#!/bin/bash
# Backup script
BACKUP_DIR="/backup/loki"
LOKI_DATA="/opt/loki/data"
DATE=$(date +%Y%m%d)

mkdir -p ${BACKUP_DIR}
# Stop Loki for consistency
sudo systemctl stop loki
# Create tarball
tar -czf ${BACKUP_DIR}/loki_backup_${DATE}.tar.gz -C ${LOKI_DATA} .
# Restart Loki
sudo systemctl start loki
# Keep last 7 days
find ${BACKUP_DIR} -name "loki_backup_*.tar.gz" -mtime +7 -delete

Restore by stopping Loki, extracting the tarball into /opt/loki/data, and starting Loki again.

Conclusion

Loki provides a cost‑effective, easy‑to‑deploy logging solution that integrates tightly with Grafana. By designing low‑cardinality labels, tuning chunk parameters, and leveraging Promtail pipelines, you can achieve reliable log collection, fast queries, and scalable storage while maintaining security and high availability.

observabilityloggingGrafanaLokiPromtail
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.