Why ClickHouse Beats Elasticsearch: Performance, Cost, and Deployment Guide
This article compares ClickHouse and Elasticsearch, analyzes cost savings, and provides step‑by‑step deployment instructions for Zookeeper, Kafka, Filebeat, and ClickHouse clusters, including configuration details, troubleshooting tips, and practical code snippets for building a scalable analytics pipeline.
Background
To support SaaS services facing data security and compliance challenges, the team needed a private‑deployment analytics capability that would improve operational insight without incurring excessive server costs.
Elasticsearch vs ClickHouse
ClickHouse, a high‑performance columnar DBMS, demonstrated superior write throughput (50‑200 MB/s per server, >600 k records/s, >5× Elasticsearch) and query speed (2‑30 GB/s page‑cache, 5‑30× faster than Elasticsearch). It also reduced storage by 1/3‑1/30 and lowered CPU/memory usage, cutting server costs roughly in half.
Cost Analysis
Based on Alibaba Cloud pricing without discounts, ClickHouse’s higher compression and lower resource consumption translate into significantly lower operational expenses compared with Elasticsearch.
Environment Deployment
Zookeeper Cluster Deployment
Install Java and NTP, create directories, download and extract Zookeeper 3.7.1, set ZOOKEEPER_HOME, configure zoo.cfg (tickTime=2000, initLimit=10, syncLimit=5, dataDir, dataLogDir, clientPort=2181, server.1, server.2, server.3), create myid files on each node, and start the service with sh zkServer.sh start.
Kafka Cluster Deployment
Create /usr/kafka, install Kafka 3.2.0, configure broker.id, listeners, buffer sizes, log directory, partitions, replication factor, and other settings. Start Kafka as a background process and use kafka-topics.sh and kafka-console-consumer.sh for management.
Filebeat Deployment
Import the Elastic GPG key, add an elastic.repo for the 8.x packages, install filebeat, enable and add it to startup, then configure /etc/filebeat/filebeat.yml with keys_under_root: true, Kafka output hosts, topic, compression, and field‑dropping processors. Run Filebeat with
nohup ./filebeat -e -c /etc/filebeat/filebeat.yml > /user/filebeat/filebeat.log &.
ClickHouse Deployment
Verify CPU supports SSE4.2, create data directories, update /etc/hosts for ClickHouse nodes, tune CPU governor, disable overcommit, and turn off transparent huge pages. Install ClickHouse server and client via yum, adjust config.xml log level, and check logs in /var/log/clickhouse-server/. Start/stop services with sudo clickhouse start etc.
Creating Kafka Engine Table
Use
CREATE TABLE default.kafka_clickhouse_inner_log ON CLUSTER clickhouse_cluster (... ) ENGINE = Kafka SETTINGS kafka_broker_list='kafka1:9092,kafka2:9092,kafka3:9092', kafka_topic_list='data_clickhouse', kafka_group_name='clickhouse_xxx', kafka_format='JSONEachRow', kafka_row_delimiter='
', kafka_num_consumers=1;Troubleshooting Kafka Engine Table
Enable direct select with --stream_like_engine_allow_direct_select 1 when launching clickhouse-client.
Creating Local Replicated Table
Define a ReplicatedReplacingMergeTree table with appropriate shard and replica macros, partition by date_partition, and set index_granularity=8192. Ensure each node has a unique shard value to avoid macro errors.
Creating Distributed Table
Create a distributed table that maps to the local table using
ENGINE = Distributed(clickhouse_cluster, default, bi_inner_log_local, xxHash32(log_uuid)).
Resolving Authentication Errors
Update remote_servers configuration with correct user and password entries for each shard.
Creating Materialized View
Synchronize Kafka data to the distributed table with
CREATE MATERIALIZED VIEW default.view_bi_inner_log ON CLUSTER clickhouse_cluster TO default.bi_inner_log_all AS SELECT ... FROM default.kafka_clickhouse_inner_log;Conclusion
By following official documentation and command‑line help, the team resolved all deployment issues, achieving a functional data pipeline from Filebeat through Kafka into ClickHouse with significant performance and cost benefits.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
