Operations 13 min read

Why ClickHouse Beats Elasticsearch: Performance, Cost, and Deployment Guide

This article compares ClickHouse and Elasticsearch, analyzes cost savings, and provides step‑by‑step deployment instructions for Zookeeper, Kafka, Filebeat, and ClickHouse clusters, including configuration details, troubleshooting tips, and practical code snippets for building a scalable analytics pipeline.

IT Architects Alliance

Aug 13, 2022

Why ClickHouse Beats Elasticsearch: Performance, Cost, and Deployment Guide

Background

To support SaaS services facing data security and compliance challenges, the team needed a private‑deployment analytics capability that would improve operational insight without incurring excessive server costs.

Elasticsearch vs ClickHouse

ClickHouse, a high‑performance columnar DBMS, demonstrated superior write throughput (50‑200 MB/s per server, >600 k records/s, >5× Elasticsearch) and query speed (2‑30 GB/s page‑cache, 5‑30× faster than Elasticsearch). It also reduced storage by 1/3‑1/30 and lowered CPU/memory usage, cutting server costs roughly in half.

Cost Analysis

Based on Alibaba Cloud pricing without discounts, ClickHouse’s higher compression and lower resource consumption translate into significantly lower operational expenses compared with Elasticsearch.

Environment Deployment

Zookeeper Cluster Deployment

Install Java and NTP, create directories, download and extract Zookeeper 3.7.1, set ZOOKEEPER_HOME, configure zoo.cfg (tickTime=2000, initLimit=10, syncLimit=5, dataDir, dataLogDir, clientPort=2181, server.1, server.2, server.3), create myid files on each node, and start the service with sh zkServer.sh start.

Kafka Cluster Deployment

Create /usr/kafka, install Kafka 3.2.0, configure broker.id, listeners, buffer sizes, log directory, partitions, replication factor, and other settings. Start Kafka as a background process and use kafka-topics.sh and kafka-console-consumer.sh for management.

Filebeat Deployment

Import the Elastic GPG key, add an elastic.repo for the 8.x packages, install filebeat, enable and add it to startup, then configure /etc/filebeat/filebeat.yml with keys_under_root: true, Kafka output hosts, topic, compression, and field‑dropping processors. Run Filebeat with

nohup ./filebeat -e -c /etc/filebeat/filebeat.yml > /user/filebeat/filebeat.log &

ClickHouse Deployment

Verify CPU supports SSE4.2, create data directories, update /etc/hosts for ClickHouse nodes, tune CPU governor, disable overcommit, and turn off transparent huge pages. Install ClickHouse server and client via yum, adjust config.xml log level, and check logs in /var/log/clickhouse-server/. Start/stop services with sudo clickhouse start etc.

Creating Kafka Engine Table

Use

CREATE TABLE default.kafka_clickhouse_inner_log ON CLUSTER clickhouse_cluster (... ) ENGINE = Kafka SETTINGS kafka_broker_list='kafka1:9092,kafka2:9092,kafka3:9092', kafka_topic_list='data_clickhouse', kafka_group_name='clickhouse_xxx', kafka_format='JSONEachRow', kafka_row_delimiter='
', kafka_num_consumers=1;

Troubleshooting Kafka Engine Table

Enable direct select with --stream_like_engine_allow_direct_select 1 when launching clickhouse-client.

Creating Local Replicated Table

Define a ReplicatedReplacingMergeTree table with appropriate shard and replica macros, partition by date_partition, and set index_granularity=8192. Ensure each node has a unique shard value to avoid macro errors.

Creating Distributed Table

Create a distributed table that maps to the local table using

ENGINE = Distributed(clickhouse_cluster, default, bi_inner_log_local, xxHash32(log_uuid))

Resolving Authentication Errors

Update remote_servers configuration with correct user and password entries for each shard.

Creating Materialized View

Synchronize Kafka data to the distributed table with

CREATE MATERIALIZED VIEW default.view_bi_inner_log ON CLUSTER clickhouse_cluster TO default.bi_inner_log_all AS SELECT ... FROM default.kafka_clickhouse_inner_log;

Conclusion

By following official documentation and command‑line help, the team resolved all deployment issues, achieving a functional data pipeline from Filebeat through Kafka into ClickHouse with significant performance and cost benefits.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

deployment Elasticsearch Zookeeper kafka ClickHouse cost analysis filebeat

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.