How to Build a Lightweight Log Platform with Grafana and Loki in 3 Simple Steps
This guide walks you through replacing a heavyweight ELK stack with a minimal Grafana‑Loki logging solution, covering environment requirements, installation of Loki and Promtail, configuration details, best‑practice tips, troubleshooting, and backup strategies for reliable log aggregation.
Overview
The team replaced a problematic ELK stack with Loki because Loki indexes only labels (no full‑text indexing), consumes far fewer resources, integrates natively with Grafana, and runs from a single binary.
Key Technical Features
Lightweight storage : Only label indexes are stored, reducing storage cost by >10× compared to Elasticsearch.
Cloud‑native design : Supports multi‑tenant mode, horizontal scaling, and object storage back‑ends (S3/MinIO/GCS).
Grafana integration : Uses LogQL for log queries and can be combined with Prometheus metrics.
Promtail collector : Small, Filebeat‑like agent with flexible pipeline stages for parsing and labeling.
Typical Use Cases
Small‑to‑medium log platforms (<100 GB/day) where full‑text search is not required.
Teams already using Grafana for monitoring.
Kubernetes clusters – Promtail can scrape pod logs via CRI.
Migrations from ELK where cost and operational complexity are concerns.
Environment Requirements
OS: CentOS 7+ or Ubuntu 18.04+ (Ubuntu 22.04 recommended).
Memory: ≥4 GB (8 GB+ recommended).
Disk: SSD preferred; size depends on retention policy.
Software: Loki 2.9+, Grafana 9.0+, Promtail version matching Loki.
Step‑by‑Step Setup
1. Preparation
System check
# Check OS version
cat /etc/os-release
# Check memory and disk
free -h
df -h
# Verify time sync (important for log timestamps)
timedatectl status
# If out of sync, install chrony
sudo apt install -y chrony
sudo systemctl enable --now chronyCreate directories and system user
# Create working directories
sudo mkdir -p /opt/loki/{data,config}
sudo mkdir -p /opt/promtail/config
sudo mkdir -p /opt/grafana
# Create Loki system user (no login shell)
sudo useradd --system --no-create-home --shell /bin/false loki
sudo chown -R loki:loki /opt/loki2. Install Loki
Download and install binary
# Define version (check latest at https://github.com/grafana/loki/releases)
LOKI_VERSION="2.9.4"
cd /tmp
wget https://github.com/grafana/loki/releases/download/v${LOKI_VERSION}/loki-linux-amd64.zip
unzip loki-linux-amd64.zip
sudo mv loki-linux-amd64 /usr/local/bin/loki
sudo chmod +x /usr/local/bin/loki
# Verify installation
loki --versionLoki configuration (example)
# /opt/loki/config/loki-config.yaml
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
log_level: info
common:
instance_addr: 127.0.0.1
path_prefix: /opt/loki/data
storage:
filesystem:
chunks_directory: /opt/loki/data/chunks
rules_directory: /opt/loki/data/rules
retention_policy:
retention_period: 720h # 30 days
schema_config:
configs:
- from: 2024-01-01
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 168h # 7 days
max_query_series: 5000
ingestion_rate_mb: 16
per_stream_rate_limit: 5MB
per_stream_rate_limit_burst: 15MBParameter notes auth_enabled: set to false for single‑node deployments without multi‑tenant auth. schema_config: uses the latest v13 schema with TSDB storage. limits_config: protects the cluster from spikes and defines retention.
3. Install Promtail
Promtail configuration (example)
# /opt/promtail/config/promtail-config.yaml
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /opt/promtail/positions.yaml
clients:
- url: http://localhost:3100/loki/api/v1/push
tenant_id: default
scrape_configs:
- job_name: system
static_configs:
- targets: [localhost]
labels:
job: syslog
host: ${HOSTNAME}
__path__: /var/log/syslog
- job_name: nginx
static_configs:
- targets: [localhost]
labels:
job: nginx
host: ${HOSTNAME}
__path__: /var/log/nginx/access.log
pipeline_stages:
- regex:
expression: '^(?P<remote_addr>\S+) - (?P<remote_user>\S+) \[(?P<time_local>[^\]]+)\] "(?P<method>\S+) (?P<request_uri>\S+) (?P<protocol>\S+)" (?P<status>\d+) (?P<body_bytes_sent>\d+) "(?P<http_referer>[^"]*)" "(?P<http_user_agent>[^"]*)"'
- labels:
method:
status:
- job_name: app
static_configs:
- targets: [localhost]
labels:
job: app
host: ${HOSTNAME}
__path__: /var/log/app/*.log
pipeline_stages:
- json:
expressions:
level: level
service: service
trace_id: trace_id
- labels:
level:
service:Pipeline stages let you extract fields (via regex , json ) and turn them into labels. Keep the number of labels low to avoid high cardinality.
4. Systemd Service Files
# /etc/systemd/system/loki.service
[Unit]
Description=Loki Log Aggregation System
After=network.target
[Service]
Type=simple
User=loki
Group=loki
ExecStart=/usr/local/bin/loki -config.file=/opt/loki/config/loki-config.yaml
Restart=on-failure
RestartSec=10
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
# /etc/systemd/system/promtail.service
[Unit]
Description=Promtail Log Collector
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/promtail -config.file=/opt/promtail/config/promtail-config.yaml
Restart=on-failure
RestartSec=10
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target5. Start and Verify
# Download Promtail binary (same version as Loki)
cd /tmp
wget https://github.com/grafana/loki/releases/download/v${LOKI_VERSION}/promtail-linux-amd64.zip
unzip promtail-linux-amd64.zip
sudo mv promtail-linux-amd64 /usr/local/bin/promtail
sudo chmod +x /usr/local/bin/promtail
# Reload systemd and enable services
sudo systemctl daemon-reload
sudo systemctl enable --now loki
sudo systemctl enable --now promtail
# Verify Loki readiness
curl -s http://localhost:3100/ready # expected output: ready
# Check a few metrics
curl -s http://localhost:3100/metrics | head -n 20
# Verify Promtail targets
curl -s http://localhost:9080/targets | jq .
# Test a LogQL query via API
curl -G -s "http://localhost:3100/loki/api/v1/query_range" \
--data-urlencode 'query={job="syslog"}' \
--data-urlencode 'limit=5' | jq .Object Storage Example (MinIO)
# /opt/loki/config/loki-config-s3.yaml
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
common:
instance_addr: 127.0.0.1
path_prefix: /opt/loki/data
storage_config:
tsdb_shipper:
active_index_directory: /opt/loki/data/tsdb-index
cache_location: /opt/loki/data/tsdb-cache
aws:
s3:
s3: s3://minioadmin:minioadmin@localhost:9000/loki
s3forcepathstyle: true
schema_config:
configs:
- from: 2024-01-01
store: tsdb
object_store: s3
schema: v13
index:
prefix: loki_index_
period: 24h
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 168h
max_query_series: 10000
ingestion_rate_mb: 32
ingestion_burst_size_mb: 64
compactor:
working_directory: /opt/loki/data/compactor
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150Grafana Data Source Configuration
# Quick start Grafana via Docker
docker run -d \
--name grafana \
-p 3000:3000 \
-v grafana-storage:/var/lib/grafana \
grafana/grafana:latestIn Grafana UI, add a Loki data source with URL http://localhost:3100 and click Save & Test .
Kubernetes Deployment (Helm)
# loki-stack-values.yaml (excerpt)
loki:
enabled: true
persistence:
enabled: true
size: 50Gi
config:
limits_config:
ingestion_rate_mb: 32
per_stream_rate_limit: 5MB
promtail:
enabled: true
config:
snippets:
pipelineStages:
- cri: {}
- json:
expressions:
level: level
msg: msg
- labels:
level:
service:
# Deploy with Helm
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
kubectl create namespace loki
helm install loki-stack grafana/loki-stack -n loki -f loki-stack-values.yamlMulti‑Tenant Log Isolation
# Enable multi‑tenant mode in Loki
auth_enabled: true
# Example Promtail client for dev environment
clients:
- url: http://loki:3100/loki/api/v1/push
tenant_id: dev
# Example Promtail client for test environment
clients:
- url: http://loki:3100/loki/api/v1/push
tenant_id: test
# Example Promtail client for prod environment
clients:
- url: http://loki:3100/loki/api/v1/push
tenant_id: prodIn Grafana, create separate Loki data sources for each tenant and add the HTTP header X-Scope-OrgID: <tenant>.
Best Practices
Label Design
Use low‑cardinality labels only (e.g., level, service, env). Avoid high‑cardinality fields such as user_id or request IDs.
# Bad example (high cardinality)
labels:
user_id: "12345"
# Good example (low cardinality)
labels:
level: "error"
service: "auth"
env: "prod"Chunk Configuration
ingester:
chunk_idle_period: 1h
chunk_block_size: 262144
chunk_target_size: 1536000
max_chunk_age: 2hRecording Rules (pre‑compute frequent queries)
# /opt/loki/rules/rules.yaml
groups:
- name: error_rate
rules:
- record: job:log_errors:rate5m
expr: sum(rate({level="error"}[5m])) by (job)Security Hardening
Restrict Loki port 3100 with firewall or iptables.
Enable auth_enabled: true for production multi‑tenant deployments or place an Nginx reverse proxy with Basic Auth.
Mask sensitive data in Promtail pipelines using replace stages.
pipeline_stages:
- replace:
expression: '(\d{3})\d{4}(\d{4})'
replace: '${1}****${2}' # phone number masking
- replace:
expression: 'password[=:]\s*\S+'
replace: 'password=***' # password maskingHigh Availability
Deploy Loki in micro‑service mode (separate Ingester, Distributor, Querier, Compactor).
Use object storage (S3/MinIO/GCS) for durable storage.
Run at least three Ingester replicas and set replication_factor: 3.
Troubleshooting & Monitoring
Common Issues
Entry out of order : timestamps in the same stream must be monotonic. Fix the source or adjust Promtail timestamp parsing.
Rate limit exceeded : increase ingestion_rate_mb or per‑stream limits in limits_config.
Too many outstanding requests : raise max_outstanding_per_tenant or add more query nodes.
Key Metrics (Prometheus scrape)
# Write throughput
curl -s http://localhost:3100/metrics | grep loki_distributor_bytes_received_total
# Active streams (memory usage)
curl -s http://localhost:3100/metrics | grep loki_ingester_memory_streams
# Query latency (p99)
curl -s http://localhost:3100/metrics | grep loki_request_duration_seconds_bucket
# Chunk flush count
curl -s http://localhost:3100/metrics | grep loki_ingester_chunks_flushed_totalAlert Rules (Prometheus)
groups:
- name: loki
rules:
- alert: LokiHighMemoryStreams
expr: loki_ingester_memory_streams > 100000
for: 15m
labels:
severity: warning
annotations:
summary: "Loki stream count too high"
description: "Active streams: {{ $value }}, check label design."
- alert: LokiQueryLatencyHigh
expr: histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{route=~"loki_api_v1_query.*"}[5m])) by (le)) > 30
for: 10m
labels:
severity: warning
annotations:
summary: "Loki query latency high"
description: "p99 latency exceeds 30 s."Backup & Restore (local filesystem)
#!/bin/bash
# Backup script
BACKUP_DIR="/backup/loki"
LOKI_DATA="/opt/loki/data"
DATE=$(date +%Y%m%d)
mkdir -p ${BACKUP_DIR}
# Stop Loki for consistency
sudo systemctl stop loki
# Create tarball
tar -czf ${BACKUP_DIR}/loki_backup_${DATE}.tar.gz -C ${LOKI_DATA} .
# Restart Loki
sudo systemctl start loki
# Keep last 7 days
find ${BACKUP_DIR} -name "loki_backup_*.tar.gz" -mtime +7 -deleteRestore by stopping Loki, extracting the tarball into /opt/loki/data, and starting Loki again.
Conclusion
Loki provides a cost‑effective, easy‑to‑deploy logging solution that integrates tightly with Grafana. By designing low‑cardinality labels, tuning chunk parameters, and leveraging Promtail pipelines, you can achieve reliable log collection, fast queries, and scalable storage while maintaining security and high availability.
Raymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
