Operations 9 min read

Deploy a Production-Ready Loki Cluster with S3 Storage and Redis Cache

This guide walks you through setting up a Loki logging cluster for production, covering the native architecture, key configuration differences, storage with boltdb‑shipper on S3, Redis caching, ruler setup, and adapting the Docker‑Compose deployment to Kubernetes.

Ops Development Stories

Sep 23, 2021

Deploy a Production-Ready Loki Cluster with S3 Storage and Redis Cache

Many newcomers to Loki feel confused when they encounter components like distributor, ingester, querier, and various third‑party storage dependencies, and the official documentation’s brief cluster deployment description makes deployment seem difficult. Besides the official Helm chart, the Loki repository’s production directory contains a production‑grade cluster deployment mode.

The original community approach uses Docker‑Compose to quickly spin up a Loki cluster. While we wouldn’t use Docker‑Compose on a single node in a real production environment (Docker Swarm is also excluded), the architecture and configuration files are worth studying.

Compared with a fully distributed Loki cluster, this solution has three notable differences:

The core services distributor, ingester, and querier are not separated; they run in a single instance.

External KV stores like Consul and etcd are abandoned; cluster state is maintained in memory via memberlist.

boltdb‑shipper replaces other log indexing solutions.

Thus the overall architecture becomes clearer with fewer external dependencies. Apart from S3 for storing chunks and indexes, a cache service is needed to accelerate log queries and writes.

Since Loki 2.0, the boltdb index storage has been refactored to use the new boltdb‑shipper mode, allowing indexes to be stored in S3 and eliminating the need for Cassandra or Google BigTable. This makes horizontal scaling easier. See https://grafana.com/docs/loki/latest/operations/storage/boltdb-shipper/ for details.

Native Part

memberlist

memberlist:
  join_members: ["loki-1", "loki-2", "loki-3"]
  dead_node_reclaim_time: 30s
  gossip_to_dead_nodes_time: 15s
  left_ingesters_timeout: 30s
  bind_addr: ['0.0.0.0']
  bind_port: 7946

Loki’s memberlist uses the gossip protocol to achieve eventual consistency across all nodes. The configuration mainly controls protocol frequencies and timeouts; the defaults are usually sufficient.

ingester

ingester:
  lifecycler:
    join_after: 60s
    observe_period: 5s
    ring:
      replication_factor: 2
      kvstore:
        store: memberlist
    final_sleep: 0s

The ingester’s state is synchronized to all members via gossip, and the replication factor is set to 2, meaning each log stream is written to two ingester instances for redundancy.

Extended Part

The community’s native configuration is insufficient for production needs, so we add extensions such as unified S3 storage for indexes and chunks, Redis caching, and ruler configuration.

storage

Both index and chunk storage are unified under S3 object storage, removing third‑party dependencies.

schema_config:
  configs:
  - from: 2021-04-25
    store: boltdb-shipper
    object_store: aws
    schema: v11
    index:
      prefix: index_
      period: 24h

storage_config:
  boltdb_shipper:
    shared_store: aws
    active_index_directory: /loki/index
    cache_location: /loki/boltdb-cache
  aws:
    s3: s3://<S3_ACCESS_KEY>:<S3_SECRET_KEY>@<S3_URL>/<S3_BUCKET>
    s3forcepathstyle: true
    insecure: true

Here active_index_directory points to the S3 bucket path, while cache_location stores Loki’s local bolt index cache.

The ingester uploads indexes to S3 under the path <S3_BUCKET>/index/ .

redis

The native solution lacks a cache; we introduce Redis for query and write caching. A single Redis instance is sufficient for modest cluster sizes.

query_range:
  results_cache:
    cache:
      redis:
        endpoint: redis:6379
        expiration: 1h
  cache_results: true

index_queries_cache_config:
  redis:
    endpoint: redis:6379
    expiration: 1h

chunk_store_config:
  chunk_cache_config:
    redis:
      endpoint: redis:6379
      expiration: 1h
  write_dedupe_cache_config:
    redis:
      endpoint: redis:6379
      expiration: 1h

ruler

Since Loki is now clustered, the ruler service must also be distributed. The community’s configuration omits this, so we add a complete ruler setup that stores rules in S3 and uses a consistent‑hash ring for distribution.

ruler:
  storage:
    type: s3
    s3:
      s3: s3://<S3_ACCESS_KEY>:<S3_SECRET_KEY>@<S3_URL>/<S3_RULES_BUCKET>
      s3forcepathstyle: true
      insecure: true
      http_config:
        insecure_skip_verify: true
    enable_api: true
    enable_alertmanager_v2: true
    alertmanager_url: "http://<alertmanager>"
    ring:
      kvstore:
        store: memberlist

Kubernetes Support

The most important step is to make the official Loki cluster solution runnable on Kubernetes. The full manifest is available on GitHub; clone it locally and apply.

This manifest depends only on an S3 object store, so ensure you have the AccessKey and SecretKey ready. Configure them in installation.sh and run the script to start installation.

The ServiceMonitor in the files enables Prometheus Operator metrics discovery for Loki; deployment of it is optional.

Summary

This article presents the official Loki production‑grade cluster deployment, adds extensions such as Redis caching and S3 storage, and adapts the Docker‑Compose approach to Kubernetes. The solution simplifies Loki’s distributed architecture by reducing external dependencies, making it a valuable reference for production deployments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

kubernetes Configuration S3 loki log-aggregation docker-compose

Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.