ELK vs Loki: Which Kubernetes Log Solution Saves Cost and Boosts Performance?
This article compares ELK and Loki for Kubernetes log collection, covering scenarios, prerequisites, architectural differences, storage costs, query performance, deployment steps with Helm, best‑practice optimizations, and troubleshooting tips to help you choose the most efficient solution.
Applicable Scenarios & Prerequisites
Kubernetes clusters need log aggregation, full‑text search, alerting and troubleshooting.
Prerequisites: Kubernetes 1.20+, Helm 3.x, a StorageClass for persistence, and resources (ELK 8C16G+, Loki 2C4G+).
Solution Comparison
Dimension
ELK Stack
Loki
Architecture
Elasticsearch + Logstash/Filebeat + Kibana
Loki + Promtail + Grafana
Index Strategy
Full‑text index (all fields)
Label index (metadata only)
Storage Cost
High (10‑50 GB/day/100 Pods)
Low (2‑10 GB/day/100 Pods)
Query Performance
Fast full‑text search
Fast label filtering, slower full‑text
Resource Consumption
High CPU/Memory
Low CPU/Memory
Learning Curve
Steep (ES DSL)
Gentle (LogQL similar to PromQL)
Best Use Cases
Complex queries, audit, security analysis
Prometheus users, cost‑sensitive, simple queries
Selection Advice
ELK : suitable for finance, security, audit scenarios that require complex queries.
Loki : ideal when Prometheus is already in use, budget is limited, and queries are simple.
Quick Checklist
ELK Deployment
Deploy Elasticsearch cluster.
Deploy Kibana.
Deploy Filebeat DaemonSet.
Configure index lifecycle management.
Loki Deployment
Deploy Loki (single‑binary or micro‑service mode).
Deploy Promtail DaemonSet.
Integrate Grafana.
Configure retention policy.
ELK Stack Deployment
1. Elasticsearch (Helm)
# Add Elastic Helm repo
helm repo add elastic https://helm.elastic.co
helm repo update
# Deploy Elasticsearch (3‑node cluster)
cat > es-values.yaml <<'EOF'
replicas: 3
minimumMasterNodes: 2
resources:
requests:
cpu: "2000m"
memory: "4Gi"
limits:
cpu: "4000m"
memory: "8Gi"
volumeClaimTemplate:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 100Gi
storageClassName: fast-ssd
esJavaOpts: "-Xmx4g -Xms4g"
EOF
helm install elasticsearch elastic/elasticsearch -n logging --create-namespace -f es-values.yaml
# Verify
kubectl -n logging get pod -l app=elasticsearch-master
kubectl -n logging exec -it elasticsearch-master-0 -- curl localhost:9200/_cluster/health2. Kibana
cat > kibana-values.yaml <<'EOF'
elasticsearchHosts: "http://elasticsearch-master:9200"
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1000m"
memory: "2Gi"
service:
type: ClusterIP
ingress:
enabled: true
hosts:
- host: kibana.example.com
paths:
- path: /
EOF
helm install kibana elastic/kibana -n logging -f kibana-values.yaml
# Access
kubectl -n logging port-forward svc/kibana-kibana 5601:5601
# Open http://localhost:56013. Filebeat
cat > filebeat-values.yaml <<'EOF'
daemonset:
enabled: true
resources:
requests:
cpu: "100m"
memory: "200Mi"
limits:
cpu: "500m"
memory: "500Mi"
filebeatConfig:
filebeat.yml: |
filebeat.autodiscover:
providers:
- type: kubernetes
node: ${NODE_NAME}
hints.enabled: true
hints.default_config:
type: container
paths:
- /var/log/containers/*${data.kubernetes.container.id}.log
processors:
- add_cloud_metadata: ~
- add_kubernetes_metadata: ~
output.elasticsearch:
hosts: ['${ELASTICSEARCH_HOSTS:elasticsearch-master:9200}']
index: "k8s-logs-%{+yyyy.MM.dd}"
setup.ilm.enabled: false
setup.template.name: "k8s-logs"
setup.template.pattern: "k8s-logs-*"
EOF
helm install filebeat elastic/filebeat -n logging -f filebeat-values.yaml4. Index Lifecycle Management (ILM)
# Create ILM policy
PUT /_ilm/policy/k8s-logs-policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_age": "1d",
"max_size": "50GB"
}
}
},
"delete": {
"min_age": "30d",
"actions": {
"delete": {}
}
}
}
}
}
# Apply to index template
PUT /_index_template/k8s-logs
{
"index_patterns": ["k8s-logs-*"],
"template": {
"settings": {
"index.lifecycle.name": "k8s-logs-policy",
"index.lifecycle.rollover_alias": "k8s-logs"
}
}
}Loki Stack Deployment
1. Loki (Helm)
# Add Grafana Helm repo
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
# Single‑binary mode (small cluster)
cat > loki-values.yaml <<'EOF'
loki:
auth_enabled: false
commonConfig:
replication_factor: 1
storage:
type: 'filesystem'
singleBinary:
replicas: 1
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 4Gi
persistence:
enabled: true
size: 50Gi
storageClass: fast-ssd
gateway:
enabled: false
EOF
helm install loki grafana/loki -n logging --create-namespace -f loki-values.yaml
# Verify
kubectl -n logging get pod -l app.kubernetes.io/name=loki2. Promtail
cat > promtail-values.yaml <<'EOF'
config:
serverPort: 3101
clients:
- url: http://loki:3100/loki/api/v1/push
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
tolerations:
- operator: Exists
EOF
helm install promtail grafana/promtail -n logging -f promtail-values.yaml3. Grafana Integration
# Deploy Grafana if not present
helm install grafana grafana/grafana -n logging --set persistence.enabled=true --set persistence.size=10Gi
# Get admin password
kubectl -n logging get secret grafana -o jsonpath="{.data.admin-password}" | base64 --decode
# Add Loki data source in Grafana UI (URL: http://loki:3100)4. LogQL Query Examples
# All logs in namespace default
{namespace="default"}
# Logs containing "error"
{namespace="default"} |= "error"
# Pod logs
{pod="nginx-7c6f8d9b7-abcde"}
# Regex filter
{namespace="prod"} |~ "ERROR|WARN"
# Rate of errors per minute
rate({namespace="prod"} |= "error"[5m])
# Aggregate log volume per namespace
sum by (namespace) (rate({job="promtail"}[5m]))Performance Test
ELK Full‑Text Search
GET /k8s-logs-*/_search
{
"query": {
"match": {
"message": "database connection failed"
}
}
}Latency: 50‑500 ms (depends on data volume).
Advantage: Precise full‑text matching.
Loki Label Filtering
{app="myapp"} |= "database connection failed"Latency: 100‑1000 ms (full‑text scan).
Advantage: Label filtering fast (10‑50 ms), full‑text slower.
Storage Cost Comparison
Test scenario: 100 Pods, each generating 100 MB/day.
Solution
Raw Data
Index Size
Total Storage
Compression Ratio
Elasticsearch
10 GB/day
20‑40 GB/day
30‑50 GB/day
3‑5×
Loki
10 GB/day
0.5‑1 GB/day
2‑5 GB/day
0.2‑0.5×
Conclusion: Loki reduces storage cost by 80‑90 %.
Best Practices
ELK Optimization
Index tuning (e.g., number_of_shards=3, number_of_replicas=1, refresh_interval=30s).
ILM policy: keep hot data 30 days, warm 30‑90 days, delete after 90 days.
Resource limits: ES heap ≤ 50 % of physical memory.
Loki Optimization
Design high‑cardinality labels (e.g., namespace, app, pod) and avoid low‑cardinality labels such as log_level.
Retention policy: 30 days.
Query optimization: filter by labels first, then apply full‑text search.
Troubleshooting
ELK
Yellow/red cluster status:
kubectl -n logging exec -it elasticsearch-master-0 -- curl -X GET "localhost:9200/_cluster/allocation/explain?pretty"Filebeat not collecting:
kubectl -n logging logs -l app=filebeat | grep ERRORLoki
Promtail not pushing logs:
kubectl -n logging logs -l app.kubernetes.io/name=promtail | grep errorSlow Loki queries:
kubectl -n logging logs -l app.kubernetes.io/name=loki | grep "query stats"Test environment: Kubernetes 1.30, ELK 8.10, Loki 2.9 (October 2025).
Ops Community
A leading IT operations community where professionals share and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
