ELK vs EFK vs Loki: Which Log Solution Saves Money and Boosts Performance?
This in‑depth technical guide compares ELK, EFK, and Loki across cost, performance, deployment complexity, feature completeness, and suitability for small‑to‑large teams, providing real‑world case studies, decision trees, migration steps, and cost‑optimization tips to help you choose the most efficient logging stack for your organization.
Introduction
"Our log system burns 100,000 CNY per month – is this normal?" This question sparked a deep dive into log‑collection solutions. Over eight years of experience with logging, the article compares ELK, EFK, and Loki, evaluating cost, performance, ease of use, and suitability for different team sizes and budgets.
Technical Background: Evolution and Core Requirements of Log Systems
Why Logs Are Critical
In the micro‑service and cloud‑native era, logs are the lifeline for troubleshooting, performance optimization, and security auditing .
Core Problems Solved by Log Systems
Centralized management – quickly locate the full request trace across 100 services on 50 servers.
Real‑time search – find a specific error among millions of logs per minute.
Visualization – see error trends, request volume, and slow‑query ratios.
Alerting – notify when error rates exceed thresholds.
Storage optimization – store up to 1 TB per day cost‑effectively.
Security compliance – retain audit logs for 180 days.
Three Main Solutions Overview
ELK Stack (Elasticsearch + Logstash + Kibana)
Born in 2010, ELK is the de‑facto standard. Core components:
Elasticsearch : distributed search engine for storing and retrieving logs.
Logstash : powerful ingestion pipeline with many plugins.
Kibana : web UI for querying and visualizing logs.
Typical architecture:
Application → Logstash → Elasticsearch → Kibana
↑
│
Filebeat/Beats (lightweight collectors)Market share: ~50 % of large‑scale deployments.
EFK Stack (Elasticsearch + Fluentd + Kibana)
EFK replaces Logstash with Fluentd (a CNCF project written in Ruby). It offers lower memory usage while keeping the same storage and visualization layers.
Typical architecture:
Application → Fluentd → Elasticsearch → KibanaMarket share: ~20 % – popular in Kubernetes ecosystems.
Grafana Loki
Launched in 2018 as "Prometheus for logs", Loki stores only log labels (metadata) and compresses log chunks, dramatically reducing storage costs.
Typical architecture: Application → Promtail → Loki → Grafana Market share: ~15 % – fast‑growing, especially for small‑to‑mid‑size teams.
Log Requirements for Small Teams (10‑200 people)
Cost‑sensitive: logging cost < 20 % of server cost.
Limited ops expertise: no dedicated log engineer.
Moderate log volume: 10 GB‑1 TB per day.
Clear query needs: mainly fault‑diagnosis, not deep analytics.
Quick onboarding: low learning curve.
Thus the selection logic is "good enough, cheap, easy to maintain" rather than "feature‑rich, highest performance".
Core Comparison Across Multiple Dimensions
1. Architecture Complexity & Deployment Difficulty
Dimension
ELK
EFK
Loki
Minimum HA nodes
5 (3 ES + Logstash + Kibana)
5 (3 ES + Fluentd + Kibana)
3 (Loki + Promtail + Grafana) or 1 (single‑node)
Minimum resources
14 CPU 36 GB RAM
14 CPU 36 GB RAM
3 CPU 6 GB RAM
Deployment time
2‑3 days
2 days
30 min‑1 hour
Configuration complexity
⭐⭐⭐⭐⭐ (DSL, many plugins)
⭐⭐⭐⭐ (Ruby DSL)
⭐⭐ (simple YAML)
Learning curve
⭐⭐ (steep)
⭐⭐⭐ (moderate)
⭐⭐⭐⭐ (flat)
2. Storage Cost & Query Performance
ELK/EFK store full‑text indexed logs. Typical compression 3:1, but replicas double storage. Example: 100 GB/day for 30 days → ~1.8 TB SSD, costing ~18 000 CNY/month.
Loki indexes only labels; compression ~10:1. Same workload → ~300 GB storage, costing ~3 000 CNY/month (≈ 17 % of ELK cost).
Query latency:
Hot data (≤ 7 days): sub‑second for both.
Warm data (7‑30 days): ELK 1‑3 s, Loki 2‑5 s.
Cold data (>30 days): ELK 3‑10 s, Loki 2‑5 s when using object storage.
3. Feature Completeness
Feature
ELK/EFK
Loki
Full‑text search
⭐⭐⭐⭐⭐
⭐⭐ (label filtering only)
Complex aggregations
⭐⭐⭐⭐⭐
⭐⭐⭐
Visualization
Kibana (rich)
Grafana (good)
Alerting
Watcher / ElastAlert
Prometheus Alertmanager (external)
Multi‑tenant
⭐⭐⭐⭐
⭐⭐⭐⭐⭐
Machine‑learning anomaly detection
✅ (X‑Pack)
❌
Kubernetes native
⭐⭐
⭐⭐⭐⭐⭐
4. Log Collectors Comparison
Collector
Language
Memory
CPU
Config complexity
Plugin ecosystem
K8s support
Filebeat
Go
50‑100 MB
Low
Simple YAML
Good
Fluentd
Ruby + C
200‑500 MB
Medium
Ruby DSL
Excellent
Fluent Bit
C
20‑50 MB
Low
INI
Excellent
Promtail
Go
30‑80 MB
Low
Simple YAML
Native
Real‑World Case Studies
Case 1 – Startup (35 people) chooses Loki
Budget < 20 k CNY/year.
Log volume 50 GB/day.
Already using Prometheus + Grafana.
Cost breakdown: 1 × 4‑core server + OSS storage ≈ 12 k CNY/year vs ELK ≈ 110 k CNY/year.
Result: 99.9 % uptime, mean MTTR reduced from 30 min to 5 min, saved ≈ 200 k CNY over two years.
Case 2 – Mid‑size company (120 people) migrates from ELK to Loki
Log volume 200 GB/day, ELK cost ≈ 300 k CNY/year.
After migration: Loki + OSS ≈ 60 k CNY/year (80 % reduction).
Maintenance effort dropped from half‑time to almost zero.
Case 3 – Large fintech (800 people) stays with ELK
Log volume 5 TB/day, needs deep analytics, compliance, machine‑learning alerts.
ELK provides full‑text search, X‑Pack security, and SIEM capabilities.
Annual cost ≈ 4.5 M CNY, justified by business value.
Best‑Practice Decision Tree
Primary purpose? – Fault‑diagnosis → Loki; Data analysis → ELK/EFK.
Log volume? – < 100 GB/day → Loki; 100 GB‑1 TB/day → Loki micro‑service mode or ELK with optimization; > 1 TB/day → ELK.
Budget? – < 5 万 → Loki; 5‑20 万 → Loki or EFK; > 20 万 → ELK.
Existing stack? – Prometheus/Grafana → Loki; Elasticsearch/Kibana → ELK/EFK.
Team expertise? – < 20 people, no log‑engineer → Loki; 20‑100 people with ops → Loki or EFK; > 100 people with ES experts → ELK.
Common Pitfalls & How to Avoid Them
"ELK is always the best" – leads to unnecessary cost if you only need simple search.
"Loki is too simple" – for 90 % of teams it is sufficient; upgrade only when needed.
"ELK is unaffordable" – can be optimized with ILM, cold‑hot tiers, field‑level indexing.
"Loki has no cost" – mis‑configured retention can cause unlimited storage growth.
"Fluentd always beats Logstash" – choose based on processing needs, not hype.
Migration Strategies
From ELK to Loki
Risk assessment (1 week).
Deploy Loki cluster (1 week) – configure object storage and retention.
Dual‑write phase (2 weeks) – send logs to both stacks, compare results.
Team training on LogQL (1 week).
Gradual cut‑over (1 week).
Decommission ELK after a transition period (1 week).
Key: set limits_config.retention_period and table_manager.retention_deletes_enabled to avoid runaway storage.
From Loki to ELK
Usually driven by growing analytics needs or compliance. Follow the reverse steps, but expect higher effort due to ES cluster sizing and index tuning.
Cost‑Optimization Tips
ELK/EFK
Cold‑hot‑frozen ILM policies to move older data to cheaper storage.
Disable indexing on fields that are not queried.
Sample non‑critical logs (e.g., 50 % drop for INFO level).
# Logstash sampling example
filter {
if [level] != "error" {
if rand() > 0.5 { drop { } }
}
}Loki
Use object storage (S3/OSS) for chunks.
Set appropriate limits_config.retention_period (e.g., 720h for 30 days).
Limit label cardinality – keep only essential labels (app, env, level).
# Retention example
limits_config:
retention_period: 720h # 30 daysConclusion & Outlook
The decisive rule is simple: small teams with fault‑diagnosis focus → Loki; larger teams needing deep analytics, security, or compliance → ELK/EFK. Loki covers about 80 % of use‑cases with 20‑30 % of the total cost of ownership.
Trends for 2025 include rapid Loki adoption, tighter Cloud‑native integration, AI‑driven log anomaly detection, eBPF‑based log collection, and unified observability platforms merging metrics, traces, and logs.
Final advice for decision‑makers:
Pick the solution that matches real needs, not hype.
Calculate TCO (servers + storage + bandwidth + personnel).
Start simple – Loki is often enough – and evolve only when justified.
Measure ROI: if logs generate business insights, ELK investment pays off; otherwise, prioritize cost‑effective Loki.
Continuously revisit retention, indexing, and label strategies to keep costs in check.
Remember, the best logging system is the one that lets you find problems quickly, stays affordable, and feels comfortable for your team.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
