Operations 12 min read

How We Re‑engineered Our Log Platform: From ELK to ClickHouse with Vector and Log‑Pilot

Facing data growth, reliability demands, and high maintenance costs, a company redesigned its logging stack by replacing ELK with a Kubernetes‑native pipeline built on Log‑Pilot, Vector, and ClickHouse, achieving lower cost, higher performance, and seamless migration while preserving familiar query interfaces.

dbaplus Community
dbaplus Community
dbaplus Community
How We Re‑engineered Our Log Platform: From ELK to ClickHouse with Vector and Log‑Pilot

Background and Pain Points

Log data is essential for online troubleshooting and observability, requiring stability, performance, cost‑effectiveness, ease of use, and scalability. The existing ELK‑based system, supporting cloud VMs, container logs, and special‑category logs, struggled with data volume growth, slower processing, storage shortages, and increasing reliability requirements, leading to high hardware and maintenance costs.

Comparison of Mainstream Log Platforms

Log platform comparison chart
Log platform comparison chart

Key Requirements for the New Log Platform

Functionality : efficient aggregation queries, multi‑region and cross‑tenant support.

Efficiency & Maintenance : reduce cost while handling ten‑fold scale, improve reliability, simplify operations.

Transparent Migration from ELK : maintain compatibility, allow users to continue using a Kibana‑like UI.

Enhanced System Integrity : high‑performance collectors, parallel processing.

New Architecture Overview

The redesigned stack consists of:

Log Collection – Log‑Pilot for Filebeat : a Kubernetes‑native log collector that is easy to deploy, supports multiple sources, real‑time viewing, and various outputs. It uses declarative configuration but currently lacks active maintainers.

Log Parsing – Vector : a high‑performance observability pipeline written in Rust, capable of ingesting, transforming, and routing logs, metrics, and traces to chosen sinks. It offers a DSL for safe, fast data transformation and supports custom plugins.

Log Storage – ClickHouse : chosen for its high write throughput, single‑query performance, lower server and operational costs, and simple SQL syntax.

Vector Configuration Example

# Sources
[sources.my_source_id]
  type = "kafka"
  bootstrap_servers = "10.x.x.1:9092,10.x.x.2:9092,10.x.x.3:9092"
  group_id = "consumer-group-name"
  topics = ["^(prefix1|prefix2)-.+"]

# Transforms
[transforms.my_transform_id]
  type = "remap"
  inputs = ["my_source_id"]
  source = ". = parse_key_value!(.message)"

# Sinks – console (for debugging)
[sinks.print]
  type = "console"
  inputs = ["my_transform_id"]
  encoding.codec = "json"

# Sinks – ClickHouse
[sinks.my_sink_id]
  type = "clickhouse"
  inputs = ["my_transform_id"]
  endpoint = "http://127.0.0.1:8123"
  database = "default"
  table = "table"
  auth.strategy = "basic"
  auth.user = "user"
  auth.password = "password"
  compression = "gzip"
  skip_unknown_fields = true

Important Vector‑to‑ClickHouse Tuning

Enable Vector’s automatic topic balancing to distribute data evenly.

Set appropriate batch size and write frequency (e.g., 100 000 records or 10 seconds) to avoid “Too many parts” errors.

Use distributed tables to split data across nodes, improving write speed and reliability.

Define sensible partitions based on business needs to prevent excessive partition counts.

Choose proper primary keys and indexes to keep data ordered and queryable.

ClickHouse Storage Design

Why ClickHouse : higher write throughput than Elasticsearch, fast single large queries, lower server and operational costs, and simpler SQL‑based learning curve.

Cluster Planning : consider data volume, real‑time requirements, query load, and maintenance overhead.

Table Schema Guidelines : create indexes on frequently queried fields, select appropriate partition keys, use MergeTree engines with sorting keys matching query patterns, and pick compression algorithms (LZ4 for speed, ZSTD for space) based on performance‑vs‑storage trade‑offs.

Distributed Table Creation (example):

CREATE TABLE db.d_table_name ON CLUSTER cluster AS db.local_table_name ENGINE = Distributed(cluster, db, local_table_name, sharding_key);

Visualization and Analysis Platform

A custom UI mimicking Kibana/Alibaba Cloud SLS was built to reduce migration friction, integrate with monitoring, tracing, and alerting components, and provide features such as query highlighting, time‑distribution preview, and log snippet display.

Monitoring & Alerting

ClickHouse exposes performance metrics (query time, memory, disk usage, connections) which are scraped by Prometheus and visualized in Grafana.

Results and Future Plans

Integration with server logs, Nginx logs, etc., cut total logging cost by ~60% and increased stored log volume by ~30% compared to the previous ELK setup.

Planned enhancements include further query optimizations (PreWhere/Where tuning), cold‑hot tiered storage, and continued migration tooling.

Conclusion

Migrating logs from Elasticsearch to ClickHouse dramatically reduces resource consumption and operational complexity while delivering superior query performance; however, Elasticsearch remains indispensable for certain business scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ObservabilityKubernetesClickHouseloggingELKVectorlog collection
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.