Deploy a Production-Ready Loki Cluster with S3 Storage and Redis Cache
This guide walks you through setting up a Loki logging cluster for production, covering the native architecture, key configuration differences, storage with boltdb‑shipper on S3, Redis caching, ruler setup, and adapting the Docker‑Compose deployment to Kubernetes.
Many newcomers to Loki feel confused when they encounter components like distributor, ingester, querier, and various third‑party storage dependencies, and the official documentation’s brief cluster deployment description makes deployment seem difficult. Besides the official Helm chart, the Loki repository’s production directory contains a production‑grade cluster deployment mode.
The original community approach uses Docker‑Compose to quickly spin up a Loki cluster. While we wouldn’t use Docker‑Compose on a single node in a real production environment (Docker Swarm is also excluded), the architecture and configuration files are worth studying.
Compared with a fully distributed Loki cluster, this solution has three notable differences:
The core services distributor, ingester, and querier are not separated; they run in a single instance.
External KV stores like Consul and etcd are abandoned; cluster state is maintained in memory via memberlist.
boltdb‑shipper replaces other log indexing solutions.
Thus the overall architecture becomes clearer with fewer external dependencies. Apart from S3 for storing chunks and indexes, a cache service is needed to accelerate log queries and writes.
Since Loki 2.0, the boltdb index storage has been refactored to use the new boltdb‑shipper mode, allowing indexes to be stored in S3 and eliminating the need for Cassandra or Google BigTable. This makes horizontal scaling easier. See https://grafana.com/docs/loki/latest/operations/storage/boltdb-shipper/ for details.
Native Part
memberlist
memberlist:
join_members: ["loki-1", "loki-2", "loki-3"]
dead_node_reclaim_time: 30s
gossip_to_dead_nodes_time: 15s
left_ingesters_timeout: 30s
bind_addr: ['0.0.0.0']
bind_port: 7946Loki’s memberlist uses the gossip protocol to achieve eventual consistency across all nodes. The configuration mainly controls protocol frequencies and timeouts; the defaults are usually sufficient.
ingester
ingester:
lifecycler:
join_after: 60s
observe_period: 5s
ring:
replication_factor: 2
kvstore:
store: memberlist
final_sleep: 0sThe ingester’s state is synchronized to all members via gossip, and the replication factor is set to 2, meaning each log stream is written to two ingester instances for redundancy.
Extended Part
The community’s native configuration is insufficient for production needs, so we add extensions such as unified S3 storage for indexes and chunks, Redis caching, and ruler configuration.
storage
Both index and chunk storage are unified under S3 object storage, removing third‑party dependencies.
schema_config:
configs:
- from: 2021-04-25
store: boltdb-shipper
object_store: aws
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
shared_store: aws
active_index_directory: /loki/index
cache_location: /loki/boltdb-cache
aws:
s3: s3://<S3_ACCESS_KEY>:<S3_SECRET_KEY>@<S3_URL>/<S3_BUCKET>
s3forcepathstyle: true
insecure: trueHere active_index_directory points to the S3 bucket path, while cache_location stores Loki’s local bolt index cache.
The ingester uploads indexes to S3 under the path <S3_BUCKET>/index/ .
redis
The native solution lacks a cache; we introduce Redis for query and write caching. A single Redis instance is sufficient for modest cluster sizes.
query_range:
results_cache:
cache:
redis:
endpoint: redis:6379
expiration: 1h
cache_results: true
index_queries_cache_config:
redis:
endpoint: redis:6379
expiration: 1h
chunk_store_config:
chunk_cache_config:
redis:
endpoint: redis:6379
expiration: 1h
write_dedupe_cache_config:
redis:
endpoint: redis:6379
expiration: 1hruler
Since Loki is now clustered, the ruler service must also be distributed. The community’s configuration omits this, so we add a complete ruler setup that stores rules in S3 and uses a consistent‑hash ring for distribution.
ruler:
storage:
type: s3
s3:
s3: s3://<S3_ACCESS_KEY>:<S3_SECRET_KEY>@<S3_URL>/<S3_RULES_BUCKET>
s3forcepathstyle: true
insecure: true
http_config:
insecure_skip_verify: true
enable_api: true
enable_alertmanager_v2: true
alertmanager_url: "http://<alertmanager>"
ring:
kvstore:
store: memberlistKubernetes Support
The most important step is to make the official Loki cluster solution runnable on Kubernetes. The full manifest is available on GitHub; clone it locally and apply.
This manifest depends only on an S3 object store, so ensure you have the AccessKey and SecretKey ready. Configure them in installation.sh and run the script to start installation.
The ServiceMonitor in the files enables Prometheus Operator metrics discovery for Loki; deployment of it is optional.
Summary
This article presents the official Loki production‑grade cluster deployment, adds extensions such as Redis caching and S3 storage, and adapts the Docker‑Compose approach to Kubernetes. The solution simplifies Loki’s distributed architecture by reducing external dependencies, making it a valuable reference for production deployments.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
